Skip to main content

The Globe and Mail

Big data’s noise is drowning out the signal

Federal Employment Minister Jason Kenney's long-rising star is dimming fast. He's been in constant damage control mode over revelations that his cherished Temporary Foreign Worker program is crowding Canadians out of jobs. And he's been the fall guy over accusations his government used inflated job vacancy figures to justify an incursion onto provincial job training turf.

Instead of a skills gap to fill, Mr. Kenney's now got a credibility gap to kill.

This isn't all his fault. It turns out the job vacancy numbers Mr. Kenney had been relying on were compiled by the Finance Department, in part by tracking job postings on Kijiji, the free classified ad site that has become a go-to choice for selling 1997 NordicTracks or finding someone to cut your grass. Its usefulness as a tool for sizing up the job market is at best a matter for debate.

Story continues below advertisement

Yet, if Mr. Kenney and his advisers are guilty of anything, it is of falling victim to the same social media hype that has led many data enthusiasts to spurn official statistics as oh-so yesterday. Want to know if the flu is headed your way or the housing market is set to take off? Why, go to Google Trends. Forget the official unemployment rate. Just track "lost my job" on Twitter.

The idea that the trillions of bytes of data we generate on social media are equipping policy-makers with vast new predictive powers is all the rage these days. Official statistics, the kind compiled by bureaucrats through scientifically tested surveys and representative samples, seem to bore the geeks. But they get all hot and bothered at the mere mention of the word algorithm.

That's how we got Google Flu. Back in 2009, a couple of bright minds at Google came up with the idea that the ubiquitous search engine could provide better advance warnings of flu outbreaks than public health agencies who survey hospitals and doctors. They created an algorithm that incorporated flu-related search terms, claiming to beat health officials to the punch in predicting epidemics.

It was all a hoax, albeit an unintentional one that illustrates the hubris and false positives of the so-called "big data" revolution. A March study in the journal Science revealed that Google Flu has consistently overestimated the actual number of flu cases that end up occurring in the United States. Emergency rooms that base staffing and medical supplies on Google Flu predictions, as some experts recommend, may be wasting resources preparing for onslaughts that never materialize.

This is but one example of how big data can lead to misguided policy. Mr. Kenney's Kijiji snafu is another. You'd think this would make people cautious. But in our insatiable desire to make sense out of an increasingly complex world, we are turning evermore to big data to sort it out.

The latest trend is "data journalism" with The New York Times and several upstart media outlets hiring an army of twentysomething computer geeks to massage the numbers in order to spot trends, predict elections and provide funky, counterintuitive insights in the vein of Freakonomics.

The problem is that much of what they report is probably wrong, or at least tendentious. The Upshot, The Times feature launched April 22, has come under fire for stories that either read too much into the data or leave too much out. "First-rate analysis requires more than pretty graphs based on opaque manipulations of data unsuited to address the central substantive points," prominent U.S. political scientist Larry Bartels wrote in response to one piece on Southern politics.

Story continues below advertisement

The most common sin in data journalism is making spurious correlations. Just because Google searches of the term "mortgage" have closely tracked Canadian housing sales in the past two years means nothing on its own. Like Google Flu, the correlation holds up, until it doesn't.

This is not to knock the brainiacs at Google, Finance or in journalism who dare to think outside the box. Their creativity will eventually pay off. But for now, the noise in the data is overpowering our ability to separate the signal we think we hear.

According to Gartner Hype Cycle for new technologies, big data is moving from its "inflated expectations" phase to a "trough of disillusionment" as we learn that can't yet do what its advocates claim. Once we appreciate its limits, however, we'll more clearly see its benefits.

Until then, Mr. Kenney should be wary of big, bad data.

Report an error Editorial code of conduct Licensing Options
As of December 20, 2017, we have temporarily removed commenting from our articles. We hope to have this resolved by the end of January 2018. Thank you for your patience. If you are looking to give feedback on our new site, please send it along to If you want to write a letter to the editor, please forward to