Monday, February 18, 2013

Probability Made Less Uneasy

I’ve been leafing through a few books on probability, a subject which I’ve mostly avoided since undergrad. Originally thinking I’d just refresh what I already learned, to my surprise I was led to reconsider fundamental beliefs. What follows is my journey told via book reviews.

Hexaflexagons and Other Mathematical Diversions by Martin Gardner

As a kid, I devoured this book and the others in the series, which I later learned were collections of Mathematical Games columns from Scientific American magazine. I didn’t always understand the material, and the puzzles were often too difficult, but Gardner’s writing skill kept me reading on.

Among the many fascinating chapters was “Probability Paradoxes”. Gardner’s ability to communicate was so strong that after many years I still remember much of the content. In particular, he asked:

Mr. Smith says, "I have two children and at least one of them is a boy." What is the probability that the other child is a boy?

and his explanation of 1/3 being the correct answer not only stuck in my mind, but shaped my early views on probability. For the details, see this New Scientist article on a Martin Gardner convention.

Only a few years ago, after a debate with a friend, did I reconsider the reasoning. It turns out Gardner’s statement of the problem is ambiguous. This revelation sparked a desire to hit the books and brush up on probability one day.

A Primer of Statistics by M.C. Phipps and M.P. Quine

The second edition of this slim volume was the textbook for my first course on probability. I used it to cram for exams. For this purpose, it was good: I got decent grades.

Sadly, it wasn’t as good in other respects. I acquired a distaste for the subject. Why did Probability and Statistics seem like a bag of ad hoc tricks, with few explanations given? Do I have poor intuition for it? Or is it glorified guesswork that seems to work well enough with real-life data? Whatever the reason, I decided that for the rest of my degree I’d steer towards the Pure Mathematics offerings.

The Signal and the Noise: Why So Many Predicitons Fail — but Some Don’t by Nate Silver

My renewed interest in probability was also sparked by the United States presidential election of 2012, or rather, its aftermath. Many had predicted its outcome but few were accurate.

It was only then I read about Nate Silver, who turned out to have been famous for his prowess with predictions for quite some time. Eager to learn more, I thumbed through his bestseller.

Though necessarily light on theory, the equations that do appear are correct and lucidly explained. Also, the pages are packed with interesting data sets and anecdotes. General pronouncements are often backed up with concrete tables and graphs, though, as Silver readily admits, some qualities are difficult to quantify, resulting in potentially dubious but novel yardsticks (such as measuring scientific progress by average research and development expenditure per patent).

But most of all, I was intrigued by the tale of an ongoing conflict that I never knew existed, with frequentists on one side and Bayesians on the other. They never told me this in school!

I soon found out why: Silver states that Fisher may almost be single-handedly to blame for the dominance of frequentism, the ideology foisted upon me when I was just out of high school. Sure enough, I went back and confirmed Phipps and Quine listed Fisher in the bibliography.

Against the Gods: The Remarkable Story of Risk by Peter L. Bernstein

My dad told me about this book. Technical details are scant as it is also aimed at the general public. But in contrast to Silver’s work, what little that appears is laughably erroneous. In some sections, I felt the author was trying to trick himself into believing fallacies.

The misinformation might be mostly harmless. Those with weak mathematical ability are going to skip the equations out of fear, and those with strong mathematical ability are probably also going to skip them because they already know them.

But conceivably this book could be a gifted reader’s first introduction to probability, and it’d be a shame to start off on the wrong foot. As a sort of public service, I’ll explain some of the gaffes.


Chapter 6 contains an example expected value calculation involving a coin flip.

We multiply 50% by one for heads and do the same for the tails, take the sum---100%---and divide by two. The expected value of betting on a coin toss is 50%. You can expect either heads or tails, with equal likelihood.

Why is this wrong? How can we fix it?

The next example involves rolling two dice.

If we add the 11 numbers that might come up…the total works out to 77. The expected value of rolling two dice is 77/11, or exactly 7.

Why is this wrong? How can we fix it?

What’s the difference?

Bernstein and Silver offer competing reasons why modern civilization differs from the past. Bernstein singles out our relatively newfound ability to quantify risk, and also suggests that key intermediate steps could only have occurred at certain points in history due to the overall mood of the era.

In contrast, Silver seems to place most importance on the printing press. In an early chapter, Silver suggests that after some teething trouble (lasting 330 years), the printing press paved the way for modern society. Apart from distribution of knowledge, perhaps more importantly the printing press helped with the preservation of knowledge; previously, writing would often be lost before it could be copied.

I’m inclined to side with Silver, partly because of Bernstein’s basic technical mistakes. After observing how fast and loose Bernstein was playing with mathematics, I’m tempted to believe some of his statements are gut feelings.

There is another glaring difference. Bernstein’s book lacks any mention of the frequentist-Bayesian war. Fisher’s name is conspicuously absent.

For or Against?

Against the Gods is riveting. My favourite feature is the backstories of famous scholars. For some of them, before reading the book, the only thing I knew about them were their names, and I would have known even less if their names weren’t attached to their most famous discoveries (or at least, discoveries vaguely connected with them). Learning about their life, motivations, temperament, beliefs, and so on was illuminating. An intellectually superior form of gossip, I suppose.

However, the elementary mathematical mistakes ultimately cast a cloud of suspicion over the book. How reliable are the author’s assertions in general? Although I heartily recommend Against the Gods, I also recommend thorough fact-checking before using it as a reference.

So a tip for bestseller authors: if a section is technical, then ask an expert, be an expert, or cut it out. Too many howlers make readers like me wary of the whole, no matter how well-written and accurate the non-technical parts are.

Answers to exercises

As Bernstein himself implies, an expected value is a weighted average. We need weights, and we need numbers to sum. It takes two to tango; the expected value dance can only proceed if probabilities are accompanied by values.

One example neglects the values, and the other neglects the probabilities. The author only computes the sum of the weights for the coin flip, and the sum of the values for the dice roll. In both cases the author divides by the number of outcomes, which might be considered another error: we already divided by the number of outcomes to compute the weights (probabilities) in the first place.

Why are these blunders amusing? For the coin example, let’s ignore that the expected value is confused with a probability. Instead of a coin, consider winning the lottery. The probability of winning the lottery plus the probability of not winning the lottery sums to 100%. Dividing this by the number of outcomes, i.e. 2, yields 50%, so apparently we win or lose the lottery with equal likelihood! It’s almost like saying “either it happens or it doesn’t happen, so the chances it happens is 50%”.

For the dice example, imagine rolling 2 loaded dice, both of which almost always show 6. The expected value should be close to 12, but because the probabilities are completely ignored, the author’s procedure leads to the same expected value of 7. Surely your calculation should change if the dice are loaded?

How do we fix these problems? For the dice example, the author supplies the correct method in the very next paragraph. At last, both the probabilities and values are taken into account. Unfortunately, the author then concludes:

The expected value…is exactly 7, confirming our calculation of 77/11. Now we can see why a roll of 7 plays such a critical role in the game of craps.

This should have never been written. The first sentence suggests both methods for computing the expected value are valid, when of course it just so happens the wrong method leads to the right answer.

The second sentence is difficult to interpret. Perhaps uncharitably, I’m guessing the sentence is an upgraded version of: “Look! Here’s a 7! Didn’t we see a 7 earlier?” What would have been written if we rolled a single die? The expected value is 3.5, but a roll of 3.5 obviously has no role in any game we play with one die.

As for fixing the coin example: computing an expected value requires us to attach a numerical value to each outcome. One does not simply plow ahead with “heads” versus “tails”. We need numbers; any numbers. We could assign 42 to heads, and 1001 to tails; here, the expected value of a fair coin toss would be 50% of 42 plus 50% of 1001, which is 521.5. Typically we pick values relevant to the problem at hand: for instance, in a game where we earn a dollar for flipping heads, and lose a dollar for tails, we’d assign the values 1 and -1 (here, our expected winnings would be 0).

[It may be possible to reinterpret the coin example as assigning the value 1 to both heads and tails. But if this were done, the expected value should also be 1, not “50%”. Furthermore, we learn nothing if the outcomes are indistinguishable.]

Probability Theory: The Logic of Science by E. T. Jaynes

If only Jaynes' book had been my introduction to probability. Like a twist ending in a movie, reading it was a thought-provoking eye-opening earth-shattering experience that compelled me to re-evaluate what I thought I knew.

Whereas Silver presents whimsical examples that demonstrate the Bayesian approach, Jaynes forcefully argues for its theoretical soundness. From a few simple intuitive “desiderata” (too ill-defined to be axioms), Jaynes shows step-by-step how they imply more familiar probability axioms, and why the Bayesian approach is the natural choice. And all this happens within the first 3 chapters, which are free online.

I had been uneasy about probability because I thought it was a collection of mysterious hacks, perhaps because it had to deal with the real world. I was flabbergasted to learn probability could be put on the same footing as formal logic. All those hacks can be justified after all. Probability is not just intuition and duct tape: it can be as solid as any branch of mathematics.

Since there still exist competing philosophies of probability, presumably others find fault with Jaynes' arguments. I’m still working through it, but I’m convinced for now. If there’s another twist in this story, I’ll need another great book to show it to me.

Washington University in St. Louis maintains a page dedicated to Jaynes. It’s a shame he died before he finished writing. The remaining holes have been papered over with exercises, which explains their depth and difficulty.

It’s also a shame Jaynes left Stanford University many years ago. Had he stayed, with luck I would have discovered his work earlier, or even have met him. A backward look to the future describes his reasons for departure.

In short, Jaynes felt the “publish or perish” culture of academia was harmful and was taking over Stanford. I can’t tell if Jaynes was right because by the time I got into the game, this culture seemed universally well-established. I had no idea an alternative ever existed.


Suyog Chandramouli said...

Thanks for the links. Since you brought up the bayesian approach, I thought I should point you to "Doing Bayesian Data Analysis", a book by John Kruschke of which I've only heard good things especially with respect to accessibility and usefulness.

Panda said...

Haha... I remember Nathan had the same take on probability for everything.. either it happens or it doesn't, so its 50/ 50!