Saturday, November 10, 2012

Bayesians predicted the election incorrectly


Here's a comic about statistics.  Go ahead and read it if you haven't already.  This joke illustrates the Bayesian cartoon of the difference between Bayesian statistics and regular inferential statistics, which they call "frequentist" statistics.  (They use "frequentist" in much the same way that Republicans refer to the "Democrat Party". It's a kind of secret code or dog whistle that lets you know that the speaker holds this group in contempt.)  One other language note before I get to the substance:  I use "cartoon" here to refer to a simplified drawing that highlights key aspects of a particular understanding.  Bayesians have a certain (contemptuous) view of regular inferential statistics.  The comic illustrates this simple view.

I will summarize the comic here in case the link breaks.  Two statisticians, one Bayesian, one "frequentist", are observing a machine that can determine whether the sun has gone nova.  The machine first observes whether the sun has actually gone nova, then decides whether to lie or tell the truth by rolling two dice, and lying if it rolls boxcars.  The machine announces that the sun has gone nova.  The "frequentist" statistician, calculating that the probability of such an event happening under the null hypothesis (sun has not gone nova) is less than the standard "alpha" level of rareness, 5%, because boxcars comes up only 1 time out of 36.  Therefore he rejects the null hypothesis that the sun has not gone nova, and concludes that the sun has gone nova.  The Bayesian statistician, recognizing that the sun actually going nova is a very rare event, much rarer than rolling boxcars, concludes that the sun has likely not actually gone nova, and bets the "frequentist" $50 that the sun has not gone nova.  Hilarity ensues.

So, what's the problem?  On the face of it, nothing.  The comic correctly expresses the Bayesian cartoon view of the difference between Bayesian and "frequentist" statistical inference.  The "frequentist" evaluates only the evidence in front of him, and accepts without question whatever results he finds.  The clever Bayesian, relying on his existing knowledge of the "prior probabilities" of events, incorporates new information into his system of beliefs, but his mind is not necessarily changed by new information.  Of course this is closer to how people process new information than the "frequentist" method of ignoring all previously known information (and there is research to back that up).  If you find $5 in your pants pocket, you do not assume that all pants pockets will contain $5.

In fact there are two problems here.  Problem #1 is that normal human people, when using regular old inferential statistics, do not really accept the hard rule of the alpha level.  When an unexpected event occurs, and the probability of it happening that way by chance is more than about 1 in 100 or 1 in 1000, we temper our enthusiasm about our findings.  (Some people fail to do this, or even get excited about results that could come about by chance 1 time in 10, which is foolish.) If the subject of study is important, like say, the world ending, we replicate the experiment.  In real life, the "frequentist" would probably ask the machine a few more times before accepting the bet.  (He might even want to take that machine apart and make sure it's working properly.  There's not much else to do before the Earth is swallowed up by the sun.)

Problem #2 is that the Bayesian, no matter how many times he asks the machine, will be unconvinced that the sun has gone nova.  Let's ignore, for the sake of this argument, that if the sun had gone nova both of them would be dead.  The nova event is so rare that the Bayesian would have to plug an astronomical amount of observed data into Bayes' equation to become convinced that the result was due to a nova and not statistical flukes.  That's fair enough, but it highlights that the conclusion drawn by the Bayesian depends on his assumptions and observations, not just his observations.  The "frequentist", at least in the cartoon view, relies on observations only.  Thus two "frequentists" observing the same data, and using the same statistical test, will draw the same conclusion.  They might come up with different explanations of the data, but both will agree about the need to explain the result.  One might conclude that the sun had gone nova, and another that the dice were loaded.  Experimentation could determine whether one or more of these conclusions were correct.  (The Bayesian, meanwhile, would try to make bets rather than conduct science.  While this is more lucrative, it does not advance the cause of understanding the world around us.)

But let us suppose that a second Bayesian were observing the nova-detection machine.  This Bayesian, having seen too many B movies and read too many Internet rumors about the Mayans, is convinced that the world is about to end.  He hears the machine announce that the sun has exploded, and believes it, because it fits into his worldview.  Plugging this new piece of information into Rev. Bayes' equation only strengthens his existing (faulty) belief in the imminent end of the world.  No amount of argumentation from the other statisticians in the room would be likely to convince him otherwise.  Even if the machine were to be run a few more times, announcing that the sun was actually doing just fine, he would not view that as sufficient evidence.  Thus we can see that the idea of Bayesian statisticians as more rational or scientific than regular statisticians is faulty.  The conclusions they draw are based on their assumptions more than their observations.  If they make good assumptions, based on reliable data, their conclusions might be quite accurate, and they might well find the truth faster tan someone looking (as a "frequentist" at only one piece of evidence at a time.)  But if they make bad assumptions they may get lost in the woods.

This brings us to the election.  Much has been made about how this election proves that math works. And here is another xkcd comic about this, not coincidentally from earlier the same week. The evidence is that some statisticians, including Nate Silver and Sam Wang, looked at the history of presidential elections and constructed models of how the polls can be used to predict the winner of each state, and thus the winner of the election. I followed both of their sites compulsively for months leading up to the election.  Silver called all 50 states (and DC) correctly; Wang called 49 (and DC), and both predicted that Florida would be very close, and it was.  Nate Silver has become something of a celebrity for his efforts, and good for him.  Good for the other aggregators too.  Yay math!

So, what's the problem?  There was another group of Bayesian statisticians that made a very different prediction about the election.  They saw the same polls that Silver and Wang saw, but came in with different assumptions.  Silver and Wang took the polls at face value:  each poll might have some methodological bias and some noise, but in aggregate, the median state poll would be a good indicator of who would win a state.  The models are a bit more complex than this, but both Silver and Wang have been clear that whatever complexity or "secret sauce" they added was subtle.  Silver included economic indicators.  Wang did not, and criticized this, but won't say exactly how his model works because he wants to publish it.

The second group, though, were Republican-based aggregators, both amateur and professional (working for Romney).  These aggregators assumed that the turnout of the election would have roughly equal numbers of Republicans, Democrats, and Independents.  These GOP aggregators looked at the breakdown of the polls and saw weird features:  many more people answering the surveys called themselves Democrats than Republicans, and those calling themselves Independents preferred Romney.  Thus even though most polls favored Obama, if the turnout model was correct, the polls could be "unskewed" to include equal numbers of Democrats, Republicans, and Independents, and this rebalancing showed that Romney was leading.  The problem for these aggregators was that their model was wrong.  Many conservative people have stopped identifying themselves as Republicans, and call themselves Independents instead.  (The fake Stephen Colbert character, for instance, says he is an Independent despite always favoring Republicans.)  This phenomenon explained both the dearth of Republicans in the sample and Romney's strength among Independents.  Given this reality, no poll could convince the GOP aggregators that their assumptions were wrong or that Romney was trailing.  Romney, apparently believing them, was said to be stunned when the vote tallies came in, closely matching the median poll numbers.

So are Bayesians always right?  Should we abandon all other kinds of statistical inference?  Not at all.  Bayesians can be just as wrong as anyone who relies more on their prejudices than on the evidence in front of them.  The nice thing about Bayesian statistics is that it contains a built-in reality check. The problem is that reality has a certain well-known bias.