We increasingly live in a world of predictions. Each day, we are confronted with an endless number of economic forecasts, statistical studies and consultants’ reports telling us what the future will look like. Businesses and governments rush to adjust their policies accordingly.
Yet, the fact that most of these predictions fail to come true does not seem to shatter our trust in forecasting. We devour the next report with the credulity of five-year-olds. We’re told that the growing ability to collect and analyze huge amounts of data makes each forecast better than the last.
What if the opposite is true? What if the 2.5 quintillion bytes of data that IBM estimates are generated in the world each day – a number that’s doubling every few years – are actually undermining the accuracy of forecasting and the effectiveness of policies that rely on it?
This is the paradox of Big Data, the trendy term used to describe the massive datasets created in the Internet age. The sheer volume of new information at our fingertips surpasses our ability to understand it. Ever more powerful computers can store and measure all this data, but they cannot tell us what it all means. So far, only human beings can appreciate context, and we’re not always very good at it.
With this in mind, Big Data should inspire humility. Instead, it seems to fuel hype and chutzpah.
McKinsey & Company predicts “a tremendous wave of innovation, productivity and growth – all driven by Big Data.” A recent Harvard Business Review article warns that “data-driven decisions tend to be better decisions. Leaders will either embrace this fact or be replaced by others who do.”
Algorithms that rely on structured data (such as previous online purchases) and unstructured data (Facebook posts) are now used to predict what people will buy, whom they will marry and which candidate they will vote for. But while very good at picking off the low-hanging fruit – such as the undergrad who “likes” Barack Obama – mathematical models have their limits.
The Great Recession should have made that clear. The forecasters and risk managers who relied on supposedly foolproof algorithms all failed to see the crash coming. The historical economic data they fed into their computers did not go back far enough. Their models were not built to account for rare events. Yet, policy makers bought their rosy forecasts hook, line and sinker.
You might think that Nate Silver, the whiz-kid statistician who correctly predicted the winner of the 2012 U.S. presidential election in all 50 states, would be Big Data’s biggest apologist. Instead, he warns against putting our faith in the predictive power of machines.
“Our predictions may be more prone to failure in the era of Big Data,” The New York Times blogger writes in his recent book, The Signal and the Noise. “As there is an exponential increase in the amount of available information, there is likewise an exponential increase in the number of hypotheses to investigate … [But] most of the data is just noise, as most of the universe is filled with empty space.”
Perhaps the biggest risk we run in the era of Big Data is confusing correlation with causation – or rather, being duped by so-called “data scientists” who tell us one thing leads to another. The old admonition about “lies, damn lies and statistics” is more appropriate than ever.
Thankfully, there is a recognition among some experts that Big Data has its downsides. A 2012 Pew Research Center survey of “stakeholders in the development of the Internet” found that four in 10 agreed that “Big Data will cause more problems than it solves by 2020 … and will engender false confidence in our predictive powers.”
There is no doubt that Big Data has the awesome potential to make the world a better place. Health experts are salivating at the prospect of using huge datasets to track diseases and prevent epidemics. Physicians should be able prescribe medications more effectively. And thousands of lives could be saved by using more and better data to forecast the trajectory of hurricanes.
But for every step forward we take with Big Data, we may take two more back. In the wrong hands, the use and abuse of data can lead to disaster.
“Big Data will produce progress – eventually,” Mr. Silver says. “How quickly it does, and whether we regress in the meantime, will depend on us.”