Sunday, May 25, 2014

Straw Men in Black

There’s a phrase used to praise a book: “you can’t put it down”. Unfortunately, I felt the opposite while reading The Black Swan by Nassim N. Taleb.

I’ll admit some prejudice. We’re told not to judge a book by its cover, but review quotes in the blurb ought to be exempt. One such quote originated from Peter L. Bernstein, the author of Against the Gods. While I enjoyed reading it, his book contained a litany of elementary mathematical mistakes. Did this mean The Black Swan was similarly full of errors?

All the same, the book began well. Ideas were clear and well-expressed. The writing was confident: perhaps overly so, but who wants to read text that lacks conviction? It promised wonders: we would learn how statisticians have been fooling us, and then learn the right way to deal with uncertainty, with potentially enormous life-changing payoffs.

I failed to reach this part because several chapters in, I was exhausted by a multitude of issues. I had to put the book down. I intend to read further once I’ve recovered, and hopefully the book will redeem itself. Until then, here are a few observations.

One Weird Trick

What’s on the other end of those "one weird trick" online ads? You won’t find out easily. If clicked, one is forced to sit through a video that:

  • makes impressive claims about a product

  • takes pains to keep the product a secret

  • urges the viewer to wait until the end, when they will finally learn the secret

This recipe must be effective, because I couldn’t help feeling the book was similar. It took me on a long path, meandering from anecdote to anecdote, spiced with poorly constructed arguments and sprinkled with assurances that the best was yet to come.

Perhaps this sales tactic has become a necessary evil. With so much competition, how can a book distinguish itself? Additionally, I’m guessing fattening the book for any reason has a positive effect on sales.

Even so, the main idea of the book could be worth reading. I’ll post an update if I find out.

Lay Off Laplace

Chapter 4 features a story about a turkey. As days pass, a turkey’s belief in the proposition such as "I will be cared for tomorrow" grows ever stronger, right until the day of its execution, when its belief turns out to be false. This retelling of a parable about a chicken due to Bertrand Russell is supposed to warn us about inferring knowledge from observations, a repeated theme in the book.

But what about Laplace’s sunrise problem? By the Rule of Succession, if the sun rose every day for 5000 years, that is, for 5000 × 365.2426 days, the odds it will rise tomorrow are only 1826214 to 1. Ever since Laplace wrote about this, he has been mercilessly mocked because of this ludicrously small probability.

So which is it? Do repeated observations make our degrees of belief too strong (chicken) or too weak (sunrise)?

Live long and prosper

Much of this material is discussed in Chapter 18 of Probability Theory: The Logic of Science by Edwin T. Jaynes, which also contains the following story.

A boy turns 10 years old. The Rule of Succession implies the probability he lives one more year is (10 + 1) / (10 + 2), which is 11/12. A similar computation shows his 70-year old grandfather will live one more year with probability 71/72.

I like this example, because it contains both the chicken and the sunrise problem. Two for the price of one. Shouldn’t the old man’s number be lower than the young boy’s? One number seems too big and the other too small. How can the same rule be wrong in two different ways?

Ignorance is strength?

What should we do to avoid these ridiculous results?

Well, if the sun rose for every day for 5000 years and that is all you know, then 1826214 to 1 is correct. The only reason we think this is too low is because we know a lot more than the number of consecutive sunrises: we know about stars, planets, orbits, gravity, and so on. If we take all this into account, our degree of belief that the sun rises tomorrow grows much stronger.

The same goes for the other examples. In each one:

  1. We ignored what we know about real world.

  2. Calculated based on what little data was left.

  3. Un-ignored the real world so we could laugh at the results.

In other words, we have merely shown that ignoring data leads to bad results. It’s as obvious as noting that if you shut your eyes while driving a car, you’ll end up crashing.

Sadly, despite pointing this out, Laplace became a victim of this folly. Immediately after describing the sunrise problem, Laplace explains that the unacceptable answer arises because of wilfully neglected data. For some reason, his critics take his sunrise problem, ignore his explanation for the hilarious result, then savage his ideas.

The Black Swan joins the peanut gallery in condemning Laplace. However, its conclusion differs from those of most detractors. The true problem is that most of the data is ignored when computing probabilities. Taleb considers addressing this by ignoring even more data! This begs the question: why not toss out more? Why not throw away most of mathematics and assign arbitrary probabilities to arbitrary assertions?

Orthodox statistics is indeed broken, but not because more data should be ignored. It’s broken for the opposite reason: too much data is being ignored.

Poor Laplace. Give the guy a break.

Hempel’s Joke

Stop me if you’ve heard this one: 2 + 2 = 5 for sufficiently large values of 2. This is obviously a joke (though sometimes told so convincingly that the audience is unsure).

Hempel’s Paradox is a similar but less obvious joke that proceeds as follows. Consider the hypothesis: all ravens are black. This is logically equivalent to saying all non-black things are non-ravens. Therefore seeing a white shoe is evidence supporting the hypothesis.

The following Go program makes the attempted humour abundantly clear:

package main

import "fmt"

func main() {
state := true
for {
var colour, thing string
if _, e := fmt.Scan(&colour, &thing); e != nil {
if thing == "raven" && colour != "black" {
state = false
fmt.Println(" hypothesis:", state)

A sample run:

black raven
hypothesis: true
white shoe
hypothesis: true
red raven
hypothesis: false
black raven
hypothesis: false
white shoe
hypothesis: false

The state of the hypothesis is represented by a boolean variable. Initially the boolean is true, and it remains true until we encounter a non-black raven. This is the only way to change the state of the program: neither "black raven" nor "white shoe" has any effect.

Saying we have "evidence supporting the hypothesis" is saying there are truer values of true. It’s like saying there are larger values of 2.

The original joke exploits the mathematical concept “sufficiently large” which has applications, but is absurd when applied to constants.

Similarly, Hempel’s joke exploits the concept "supporting evidence", which has applications, but is absurd when applied to a lone hypothesis.

Off by one

If we want to talk about evidence supporting or undermining a hypothesis mathematically, we’ll need to advance beyond boolean logic. Conventionally we represent degrees of belief with numbers between 0 and 1. The higher the number, the stronger the belief. We call these probabilities.

Next, we propose some mutually exclusive hypotheses and assign probabilities between 0 and 1 to each one. The sum of the probabilities must be 1.

If we take a single proposition by itself, such as "all ravens are black", then we’re forced to give it a probability of 1. We’re reduced to the situation above, where the only interesting thing that can happen is that we see a non-black raven and we realize we must restart with a different hypothesis. (In general, probability theory taken to extremes devolves into plain logic.)

We need at least two propositions with nonzero probabilties for the phrase "supporting evidence" to make sense. For example, we might have two propositions A and B, with probabilities of 0.2 and 0.8 respectively. If we find evidence supporting A, then its probability increases and the probability of B decreases accordingly, for their sum must always be 1. Naturally, as before, we may encounter evidence that implies all our propositions are wrong, in which case we must restart with a fresh set of hypotheses.

To avoid nonsense, we require at least two mutually exclusive propositions, such as A: "all ravens are black", and B: "there exists a non-black raven", and each must have a nonzero probability. Now it makes sense to ask if a white shoe is supporting evidence. Does it support A at B’s expense? Or B at A’s expense? Or neither?

The propositions as stated are too vague to answer one way or another. We can make the propositions more specific, but there are infinitely many ways to do so, and the choices we make change the answer. See Chapter 5 of Jaynes.

One Card Trick

Instead of trying to flesh out hypotheses involving ravens, let us content ourselves with a simpler scenario. Suppose a manufacturer of playing cards has a faulty process that sometimes uses black ink instead of red ink to print the entire suit of hearts. We estimate one in ten packs of cards have black hearts instead of red hearts and is otherwise normal, while the other nine decks are perfectly fine.

We’re given a pack of cards from this manufacturer. Thus we believe the hypothesis A: "all hearts are red" with probability 0.9, and B: "there exists a non-red heart" with probability 0.1. We draw a card. It’s the four of clubs. What does this do to our beliefs?

Nothing. Neither hypothesis is affected by this irrelevant evidence. I believe this is at least intuitively clear to most people, and furthermore, had Hempel spoke of hearts and clubs instead of ravens and shoes, his joke would have been more obvious.

Great Idea, Poor Execution

The Black Swan attacks orthodox statistics using Hempel’s paradox, alleging that it shows we should beware of evidence supporting a hypothesis.

It turns out orthodox statistics can be attacked with Hempel’s paradox, but not by claiming "supporting evidence" is meaningless. That would be like claiming "sufficiently large" is meaningless.

Instead, Hempel’s joke reminds us we must consider more than one hypothesis if we want to talk about supporting evidence. This may seem obvious; assigning a degree of belief in a lone proposition is like awarding points in a competition with only one contestant.

However, apparently it is not obvious enough. The Black Swan misses the point, and so did my university professors. My probability and statistics textbook instructs us to consider only one hypothesis. (Actually, it’s worse: one of the steps is to devise an alternate hypothesis, but this second hypothesis is never used in the procedure!)

Mathematics Versus Society

In an off-hand comment, Taleb begins a sentence with “Mathematicians will try to convince you that their science is useful to society by…”

By this point, I already found faults. First and foremost: how often do mathematicians talk about their usefulness to society? There are many jokes about mathematicians and real life, such as:

Engineers believe their equations approximate reality. Physicists believe reality approximates their equations. Mathematicians don’t care.

The truth is being exaggerated for humour, but asserting their work is useful in the real world is evidently a low priority for mathematicians. It is almost a point of pride. In fact, Taleb himself later quotes Hardy:

The “real” mathematics of the “real” mathematicians…is almost wholly “useless”.

This outlook is not new. Gauss called number theory “the queen of mathematics”, because it was pure and beautiful and had no applications in real life. (He had no way of foreseeing that number theory would one day be widely used in real life for secure communication!)

But sure, whatever, let’s suppose mathematicians go around trying to convince others that their field is useful to society. [Presumably Hardy would call such a mathematician “imaginary” or “complex”.] They are trivially right. If you try to talk about how useful things are to society, then you’ll want to measure and compare usefulness of things, all the while justifying your statements with sound logical arguments. Measuring and comparing and logic all lie squarely in the domain of mathematics.

Jumping to Conclusions

So far, I feel the author’s heart is in the right place but his reasoning is flawed. Confirmation bias is indeed pernicious, and orthodox statistics is indeed erroneous. However, The Black Swan knocks down straw men instead of hitting these juicy targets.

The above are but a few examples of the difficulties I ran into while reading the book. I had meant to pick apart more specious arguments but I’ve already written more than I had intended.

Again, I stress I have not read the whole work, and it may improve in the second half.