Current Issue

This Article From Issue

November-December 2004

Volume 92, Number 6

Information Theory, Inference, and Learning Algorithms. David J. C. MacKay. xii + 628 pp. Cambridge University Press, 2003. $50.

Students taking classes in probability and statistics often come away thinking that those subjects are relevant only for determining whether dice are loaded or calculating the odds that a particular sequence of colored balls will be drawn from an urn. In reality, of course, probability and statistics have a wide range of practical applications, some of which are reviewed in David MacKay's Information Theory, Inference, and Learning Algorithms. The thread MacKay follows throughout the book is the use of principled mathematical approaches—mainly probability and Bayesian statistics—for handling, communicating and extracting information.

Ad Right

Much has been said about the importance of interdisciplinary research in modern science as a vehicle for cross-fertilization between fields and for facilitating breakthroughs when existing methods fail. In reality the term is often misused. MacKay's book is genuinely interdisciplinary: It traces the applicability of concepts and methodologies across disciplines, with an emphasis on his own main research interests—information theory, Bayesian statistics and machine learning.

The book has six main parts covering various aspects of information theory and inference; each part consists of several "bite size" chapters, which are easy to read and digest. MacKay offers a good overview of the main concepts, emphasizing insight and intuitive understanding; rigorous (or semirigorous) proofs follow later to satisfy the demanding reader. The presentation itself is very informal, with an occasional "pause for thought" in the form of a statement or an interesting question.

A broad range of exercises appear at the end of each chapter, in some cases along with—quite exceptionally—the worked-out solutions. Both the exercises and solutions should be invaluable to students for understanding the material. The book is highly readable and accessible enough to be studied independently.

Sections on information theory, probability and inference methods are interleaved in a manner that may not be to everyone's liking but is consistent and logical. In the preface, Mac­Kay sketches the interdependencies between the various chapters and offers four recipes for using the book as the main textbook in courses on information theory or on Bayesian inference and machine learning.

In the sections on information theory, which make up about half the book, MacKay manages to get the main points across with fewer formal definitions and complicated derivations than one typically finds in more traditional textbooks. The main tools and definitions are provided alongside review material from basic probability and statistics.

The material covered is essential to the understanding of electronic communication, from mobile and satellite communication to digital TV and the Internet. Most modern digital communication systems require efficient methods of source and channel coding. Source coding refers to the compression of redundant information (in pictures and music, for example), even at the expense of fidelity, whereas channel coding refers to the introduction of some controlled redundancy prior to transmission in order to protect the information against corruption in a noisy transmission medium (for example, deep space, atmosphere or optical fibers).

MacKay covers basic material concerning information theory and coding as well as advanced techniques, such as low-density parity-check codes (in whose development he played a significant part), and turbo and fountain codes. These new error-correcting codes facilitate the transmission of information at close-to-optimal rates. This more advanced material, which is presented toward the end of the book, provides insight into what are arguably the main developments in information theory over the past decade.

The other main area covered by the book is inference and its relation to learning algorithms. Humans have been trying to delegate more and more tasks to machines since prehistoric times. In recent decades, research has focused on modeling knowledge in an attempt to make it possible to use computers for handling sophisticated tasks (such as the classification of handwritten digits, voice recognition and forecasts of financial indicators). The problem of defining the model and estimating its parameters has been expressed in a clear mathematical form, using probability theory and Bayesian statistics, and new and more efficient methods have been devised to solve it. For instance, given a model we aim to infer its parameters from data, represented by observations or measurements.

MacKay's presentation of probability, Bayesian statistics and inference methods is excellent. Following tradition, he discusses the distribution of balls in urns, alongside more interesting and relevant examples of inference from medical and crime-scene data. The use of Bayesian statistics—for inference in general as well as in specific models such as Gaussian processes and neural networks—follows naturally. Also covered within the same framework are several related problems, such as model selection and evaluation and the automatic determination of relevance of variables.

One of the main difficulties in representing problems probabilistically is that a significant computational effort is required to obtain exact results. The solution comes from an unexpected source: the casinos of Monte Carlo. In Monte Carlo sampling, some of the more demanding calculations are replaced by sampling techniques based on randomly drawing numbers from a given probability distribution so that the statistical properties of the sampled values provide a good approximation to the required result. The book covers a range of approximations and sampling techniques, including such recent approaches as "exact sampling" and "slice sampling." MacKay provides numerical examples to demonstrate the efficacy of the methods.

Several other areas are briefly explained in isolated chapters, such as crosswords and codebreaking, information acquisition and evolution, and various aspects of statistical physics. The insight and understanding provided by these chapters is limited by the fact that they are only loosely linked to the main themes of the book.

This is primarily an excellent textbook in the areas of information theory, Bayesian inference and learning algorithms. Undergraduate and postgraduate students will find it extremely useful for gaining insight into these topics; however, the book also serves as a valuable reference for researchers in these areas. Both sets of readers should find the book enjoyable and highly useful.—David Saad, Neural Computing Research Group, School of Engineering and Applied Science, Aston University, United Kingdom

American Scientist Comments and Discussion

To discuss our articles or comment on them, please share them and tag American Scientist on social media platforms. Here are links to our profiles on Twitter, Facebook, and LinkedIn.

If we re-share your post, we will moderate comments/discussion following our comments policy.