Skip to main content

Odd data pairings - such as eating cheese before bed can cause nightmares - compiled by Tyler Vigen, show correlations are a good starting point for more sophisticated analysis.

Thinkstock/Thinkstock

You've heard the advice before: Go easy on the cheese before bedtime to avoid bad dreams.

What you may not know is that if you compare U.S. Department of Agriculture data on per capita cheese consumption since 2000 with the number of people who die each year from getting tangled in their bedsheets (more than 800 in 2008, according to the Centers for Disease Control and Prevention), you get an almost perfect match.

Apparently the more mozzarella we scarf, the more people meet this ignoble end. The correlation between the two data sets is 95 per cent, which indicates that they rise and fall in near-perfect sync.

Story continues below advertisement

The cheese and bedsheet-death link is one of tens of thousands of such pairings churned out by an algorithm programmed earlier this month by Tyler Vigen, a Harvard law student – and the point is that correlation, a statistical measure that assesses how closely two data sets match, isn't always what it seems.

Vigen says his idea was sparked by an image showing the surprisingly good match between the crime rate in New York and a photo of a mountain range.

He decided to look for other random correlations by uploading vast swaths of freely available data from places such as U.S. government websites and searching for matching sets.

These deliberately odd results, displayed on his website, are mostly ludicrous enough that no one would be tempted to believe that they're anything but coincidental.

As egg consumption rises and falls, for example, so too does the number of non-collision road deaths. And the number of non-commercial space launches around the world seems to depend on the number of sociology doctorates awarded in the United States.

In the real world, misleading correlations are much harder to spot, because they show up in situations that actually make sense.

For example, a 2009 Archives of Internal Medicine study of 500,000 people found a correlation between meat consumption and death during the 10-year study period, even when possible confounding factors like age, education, and exercise habits were taken into account.

Story continues below advertisement

This is a plausible finding – but if you look more closely at the study results, you find that red meat consumption also seemingly raises your risk of sudden accidental death from causes like car crashes and gun shots. This is clearly ridiculous, and indicates that there are other underlying lifestyle factors that affect both meat consumption and mortality.

Another place where suggestive correlations often show up is in attempts to explain the rise in obesity rates over the past half-century.

Fat and carbohydrate consumption have both risen in lockstep with obesity rates – but then again, so have protein and total calorie consumption, along with countless other factors like the processing speed of computers. The mere fact that two variables have both increased over time isn't enough to draw any conclusions.

So is looking for correlations a blatant misuse of statistics that should be disregarded entirely? Not so fast, Vigen says: "Correlations are an important starting place because they can influence they way we research."

In other words, they offer a useful starting point to generate hypotheses or test ideas. Vigen cites Randall Munroe, the author of the popular science Web-comic xkcd, who says: "Correlation does not imply causation, but it does waggle its eyebrows suggestively and gesture furtively while mouthing 'look over there.'"

Once you know where to look, there are several ways to bolster a case built initially on correlation.

Story continues below advertisement

Statisticians have more sophisticated techniques for assessing the likelihood of coincidence when two variables show an apparent match, for example, by looking at how often the two lines cross and recross each other on a graph. Coming up with a reasonable explanation for why and how the variables influence each other is also important.

Still, the best to way to confirm a causal relationship between two variables is to change one of them and see how the other responds.

Knowing that the link between cheese and bedsheet deaths was semi-randomly generated by a computer should make you much less likely to believe it. But if anyone wants to run a study on dream patterns following prebedtime cheese consumption, I volunteer to be in the lots-of-cheese group.

Alex Hutchinson blogs about exercise research at sweatscience.runnersworld.com.

Report an error Editorial code of conduct
Comments

Welcome to The Globe and Mail’s comment community. This is a space where subscribers can engage with each other and Globe staff. Non-subscribers can read and sort comments but will not be able to engage with them in any way. Click here to subscribe.

If you would like to write a letter to the editor, please forward it to letters@globeandmail.com. Readers can also interact with The Globe on Facebook and Twitter .

Welcome to The Globe and Mail’s comment community. This is a space where subscribers can engage with each other and Globe staff. Non-subscribers can read and sort comments but will not be able to engage with them in any way. Click here to subscribe.

If you would like to write a letter to the editor, please forward it to letters@globeandmail.com. Readers can also interact with The Globe on Facebook and Twitter .

Welcome to The Globe and Mail’s comment community. This is a space where subscribers can engage with each other and Globe staff.

We aim to create a safe and valuable space for discussion and debate. That means:

  • All comments will be reviewed by one or more moderators before being posted to the site. This should only take a few moments.
  • Treat others as you wish to be treated
  • Criticize ideas, not people
  • Stay on topic
  • Avoid the use of toxic and offensive language
  • Flag bad behaviour

Comments that violate our community guidelines will be removed. Commenters who repeatedly violate community guidelines may be suspended, causing them to temporarily lose their ability to engage with comments.

Read our community guidelines here

Discussion loading ...

Due to technical reasons, we have temporarily removed commenting from our articles. We hope to have this fixed soon. Thank you for your patience. If you are looking to give feedback on our new site, please send it along to feedback@globeandmail.com. If you want to write a letter to the editor, please forward to letters@globeandmail.com.
Cannabis pro newsletter