Spurious Correlations

The Washington Redskins versus the Carolina Panthers is not usually a match-up liable to generate much excitement beyond the supporters of the two teams involved; this Sunday’s game, however, is different. As the final Redskins home game before the US presidential election it will be followed by many not even interested in American Football. Because you can forget the economy, forget the charisma of the candidates, a more accurate ‘predictor’ of the outcome of the US presidential election is the ‘Redskins Rule’.  In 17 out of the 18 presidential elections held since the Redskins NFL franchise moved from Boston to Washington DC in 1937 when the Redskins have won their final home game before the election the candidate of the incumbent party has gone on to win the presidency, when the Redskins have lost so have the incumbent party.

Since featuring on ESPN’s coverage of the Redskins-Titans game immediately prior to the 2000 election the Redskins Rule has become a traditional part of the US election coverage. It has even seeped into popular culture: the season one finale of Mad Men opened, somewhat anachronistically one suspects, with a character bemoaning the 1960 election result –  “Nixon didn’t stand a chance, the Browns trounced the Redskins 31-10; the result of that last home game has correctly predicted the last six elections”.

At first glance it seems extraordinary that there should be such a strong correlation between the result of a football game played in the nation’s capital and the presidential election, however, when we consider the sheer number of possible correlations out there our amazement should diminish. For a start there are many things the game could predict other than the fate of the incumbent – most obviously the party of the winner but also, say, how the elder of the two candidates fares. Moreover if we’d been told the Redskins’ last away game before the election was the predictor, or their first home game of the season, or their Win-Loss record, or the Dallas Cowboys’ or the Baltimore Ravens’ final home game, or indeed whether the New York Yankees had played in the World Series, or the LA Lakers had made the NBA finals, our reaction would still have been one of wonder.

As it turns out whether the Lakers appeared in the NBA finals has been a remarkably good predictor of the election result too. Since the Lakers were formed in 1947 the ‘Laker Law’ – if they make the NBA finals the Republicans win the presidency, if they don’t then the Democrats win – has held for 14 out of 16 elections. The good news for Obama is the Lakers failed to make the Finals in 2012 (the bad news is the Laker Law predicted a McCain victory in 2008).

Now this is all very diverting, something to liven up the conversation down the pub, but it doesn’t actually mean anything, does it? Clearly on a rational level we know such correlations are meaningless but it’s hard to shake a residual sense that the pattern will continue. Indeed such is our innate propensity to see patterns as predictive you can be sure that even many of those Obama supporters who recognise the absurdity of it all will be secretly rooting for the Redskins on Sunday (and taking the Lakers’ failure to be a good sign).

The human facility for pattern recognition is quite something. Consider Zooniverse, a set of ‘citizen science projects’ where raw scientific data is uploaded onto the internet for volunteers to pore over. The project’s efficacy rests on the fact that people are actually much better at noticing visual patterns than computers. In one project volunteers classify galaxies, in another they search for extrasolar planets, in another the sounds of killer whales are categorised – the work ordinary members of the public (700,000 of them at the last count) have put into these, and other, projects have spawned numerous important scientific papers in the five years of Zooniverse’s existence.

Our ability to spot patterns lies at the heart of human endeavour; it is through pattern recognition that we make sense of the world, it is through patterns that we predict and plan for the future. From the adoption of arable farming to modern weather forecasting, from Halley ’s Comet to the latest opinion polls, patterns have been harnessed by mankind for thousands of years. However there is a problem: our reluctance to accept that some patterns are meaningless. As a species we’re really not very good with randomness, we have trouble accepting the idea that not everything happens for a reason. Evolution favours this trait: over-ascribing meaning is costly – we run away from harmless bushes that are masquerading as tigers, we abandon perfectly safe waterholes because we happen to get sick soon after a drink – but under-ascribing meaning is liable to prove fatal if the bush really is a tiger, if the waterhole really is the cause of our sickness.

However what served us well tens of thousands of years ago doesn’t necessarily serve us so well these days. Today, with so much more information, there are so many more correlations, both meaningful and meaningless, and with the vast amount of computer power available to pore over that information they are so much easier to find (humans may be better at spotting patterns, seeing many that computers miss, but they’re an awful lot slower).

Using computers to mine data for patterns is now a mainstream, and undoubtedly useful, business activity – Amazon recommendations (or Google searches or Nectar card offers) are arrived at through data mining, while your email provider uses data mining to determine what constitutes spam. However – as with the Redskins Rule – with so many potential correlations many of those the computers do pick up will be the result of chance alone (so now you know why Amazon thought you might be interested in 90s Europop).

The financial sector is particularly keen on looking for patterns, for extra clues as to which way the market might be heading. As it happens our old friend the NFL has proven to be a good stock market predictor. The ‘Super Bowl Indicator’ – when a team from the original NFL (the current NFL was created through a merger of the NFL and the AFL) wins the Super Bowl the stock market goes up and when a team from the original AFL wins it goes down – has proved right roughly 80% of the time since the first Super Bowl in 1967. When journalist Leonard Koppett first wrote about the correlation in 1978 (at which point it had worked 11 times out of 11) he was trying to poke fun at our statistical naiveté, but alas these things have a habit of being taken seriously. Combine the innate belief that patterns have meanings and the desire for easy money and you have a recipe for self-delusion.

Perhaps the best example of what you can do with data if you really put your mind to it comes from financial technologist David Leinweber. Nearly twenty years ago, in an effort to satirise the worst excesses of data mining, Leinweber took a CD/ROM full of economic data from 140 UN members for the years 1983-1993 and ran it through his computer to determine which set of data was most closely correlated to the S&P 500 (a leading US stock index). The answer turned out to be Bangladeshi butter production, which had a 75% correlation with the S&P 500 (a greater correlation than that between height and weight in humans). By adding in US butter and cheese production Leinweber got the correlation up to 95%; when he further added the Bangladeshi and US sheep population the correlation grew to 99%. To this day Leinweber still gets asked for the latest Bangladeshi butter production figures despite the fact that, predictably, after 1993 (and before 1983) the correlation is nonexistent.

Very few of us are liable to take seriously correlations as obviously spurious as the examples given above; the trouble comes when it is possible to posit a plausible causal mechanism. If there is one thing that matches our ability to discern patterns it’s our ability to come up with explanations for them; combine a false correlation with a reasonable explanation and occasionally things can get very messy.

A prime example of this is the MMR scare. Children who develop autism tend to show the first symptoms soon after receiving the MMR vaccine; this is a correlation with an all too plausible causal mechanism. When the possible link hit the national media in 2001-02 there was a sharp drop in the take-up of the vaccine. Subsequent research has shown the link to be fallacious, the correlation exists because of a third factor, namely time: autistic symptoms tend to emerge soon after the 2nd birthday which happens to be the same time the UK’s toddlers get their MMR jabs. The damage has already been done, however, vaccination rates remain considerably lower than before the controversy and measles is once again endemic in the UK. Unfortunately reassuring medical announcements are no match for a believable correlation, particularly when it’s combined with a distrust of science and a belief that harmful actions are more blameworthy than harmful inactions.

It is one of our great strengths that when we see correlation we automatically suspect causation, that we see patterns as meaningful signposts as to what the future might hold. However it’s so ingrained that we tend to neglect the need to be awake to other interpretations – that the correlation might be due to a third factor, for example, or a mere coincidence. Arguably our susceptibility to religion (‘the earthquake’s a sign that God is angry’), our proclivity for conspiracy theories (‘Princess Diana can’t just have died in a senseless car accident, it must have been a plot’) and our habit of seeing luck as predictive rather than merely descriptive (‘I can’t stop, I’m on a roll’) are all a result of our inability to accept that some things are meaningless. We’ve come a long way from divining the future from sheep’s entrails but the continuing existence of astrology, the popularity of lottery prediction sites, and, yes, the fact that I’ll be cheering on the Redskins this Sunday, suggest we still have a long way to go.

(By the way if you’re wondering which of the 18 elections ruined the Redskins Rule’s perfect record, it was 2004 when Washington’s loss to the Green Bay Packers was supposed to prefigure a Kerry victory.

Or was it? The fact that in 2000 Al Gore became the first candidate in 112 years to win the popular vote but not the election means that with a little tweak we can make the rule infallible again. “If the Redskins win their last home game the party that won the greatest share of the popular vote at the previous election will go on to win the presidency” doesn’t have the same ring as the old ‘incumbent’ rule but it has held for all 18 elections since 1937 (both versions of the rule actually hold good for 1936 too but it sounds more impressive if we connect the rule to the franchise’s arrival in Washington). This isn’t quite Bangladeshi butter production but it does give a sense of just how slippery, and artificial, these correlations can be, how easy it can be to manipulate data to get the answer you want.)

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s