When Eric Horvitz was a medical student, he learned an anecdotal rule of thumb: Expect more cases of congestive heart failure around the holidays. Years later, as a computer scientist and co-director of Microsoft Research, he was able to put that rule to the test.
Working with anonymous patient records from a large urban hospital in Washington, he and colleagues were able to show how a concentration of Internet searches for recipes with ingredients high in salt and cholesterol in the neighbourhoods around the hospital seemed to closely correspond to the heart cases that showed up soon after.
For Dr. Horvitz, it was another example of the startling ways in which the Internet is revealing information about human behaviour that could be used to prevent disease and improve public health.
By combing through reams of anonymous search logs to spot hidden relationships in the data, "we get a sense for what's really going on," Dr. Horvitz said Monday at the annual meeting of the American Association for the Advancement of Science.
In another example, Dr. Horvitz and his collaborators were recently able to identify a previously unpublished side effect of combining Provacol, a commonly prescribed statin, with paroxetine, an antidepressant, by noticing the way people searched online for information about the two drugs. He also presented work showing how women diagnosed with breast cancer searched for information online around crucial decisions regarding treatment, or how subtle language changes in the Twitter feeds of new mothers could be used to track the onset of postpartum depression – data that can potentially be used going forward to design a mobile app that could tell women if they are at risk based on their communication patterns.
"We are looking at the Web as a sensor for various challenges in health and diagnosis," he added.
The work offers a striking contrast to public concerns about online surveillance, which erupted in the aftermath of the Edward Snowden scandal. The difference, researchers say, is that instead of sifting through online data, e-mail logs and other sources to identify and track individuals, researchers are looking at our collective digital records in hopes of pulling out new and useful information.
"The question is, can we – through this – understand our behaviours better?" said Jure Leskovec, a computer scientist at Stanford University.
Dr. Leskovec studies the evolution of online networks, watching how information flows and how individuals participate – or don't – depending on their use of language. He has used the data to improve how students interact as a community when they enroll in large-scale open online courses offered by the university.
Despite voicing cautions and caveats about some of the limitations of online data – including the fact that it leaves out vast groups of people who have little or no contact through the Internet – researchers at the Chicago meeting shared their growing sense of excitement about using big data to explore ideas in the social sciences that previously existed only in the realm of theory.
Michael Macy, director of the Social Dynamics Laboratory at Cornell University, said he and colleagues had been using Twitter and anonymized e-mail logs to test whether geopolitical conflicts follow the "clash of civilizations" – a controversial theory advanced two decades ago by Harvard political scientist Samuel Huntington which depicts a world aligned in mutually antagonistic cultural groups.
"We can actually revisit Huntington's thesis by looking at interpersonal communications all around the globe," Dr. Macy said. He said his analysis found a pattern in the data similar to Prof. Huntington's prediction. Although he cautioned against over-interpreting the result, he added: "I think we can still learn something from it."
Dr. Macy said that, for social scientists, the arrival of big data was having an impact not unlike that of the Hubble Space Telescope on astronomy, or the advent of functional magnetic resonance imaging (fMRI) on brain science. "We have an opportunity," he said, "to find patterns that were never visible before."