Susan Pinker is a psychologist and columnist whose most recent book, The Village Effect, explores the science of social interaction.
By May of last year, the number of research papers on COVID-19 was doubling every two weeks. This steady churn has unleashed more than 200,000 journal articles on the coronavirus so far, more than 30,000 of them as preprints, meaning studies that are not yet peer-reviewed. Most are also too fresh to have been replicated by other scientists.
Still, many of these findings are cited by scientists and journalists alike and have been shared with millions of people. Indeed, the human brain seems to be the perfect growing medium for untested, often too-good-to-be-true ideas.
Just one example: A year ago, a preprint, followed by a published scientific paper, reported that an anti-parasitic drug, Ivermectin, used to treat river blindness in sub-Saharan Africa and head lice everywhere else, could suppress SARS-CoV-2. If this were true, it would be a godsend, as Ivermectin is cheap, FDA-approved as an anti-parasitic, and widely available.
The Ivermectin study did not succeed when replicated, however, and there’s no clear evidence that it has any effect on humans with the disease. Like hydroxychloroquine – a malaria drug that was touted as a cure by Donald Trump when he was the U.S. president – Ivermectin has no clinical effects on COVID-19 in the real world.
But the cat video is out of the bag, so to speak. Ivermectin has become an internet meme; dozens of so-called scientific talks have received millions of views on YouTube, each one promoting Ivermectin’s anti-COVID benefits, with the backstory that pharmaceutical companies are suppressing this information lest it staunch the flow of profits from vaccines. In a short time, a single, non-replicated study has become the new bleach, or in medical history terms, the new snake oil.
Over the past 18 months, two pandemics have collided. On one hand, the SARS-CoV-2 virus was novel, terrifying and constantly evolving; there was a push for quick answers. On the other hand, science and medicine were in the throes of a replication crisis.
Many iconic studies, especially in my own field, psychology, were being repeated by other scientists and found to be wanting; their findings could not be duplicated. In other words, science was in the process of examining itself when COVID-19 struck. There was a desperate thirst for information and a glut of new studies. But there was also little time or patience for a basic due diligence step – replication, simply repeating an experiment to see whether it produces the same results.
Now that COVID-19 infection rates are slowing down, it’s a good time to step back and look at what types of studies seem impossible to reproduce yet have remarkable staying power. Once launched, they continue to breathe new life into what are essentially rumours; they promote misinformation while giving it the patina of science.
Many of us think that science progresses in a straight line. But it zigzags. One hypothesis emerges, changing our outlook for a moment; if it fails to be confirmed by further evidence, it drops out of sight and another one takes its place. Onward and upward, in infinitesimal steps.
Recently, though, the fact-finding trajectory seems to have changed course. Attention-grabbing studies are briskly published. The press and other researchers latch on; a TED talk and a book contract often follow. Social media help turn it into a meme, one that is often surprising, easy to grasp and sticky, like an ear-worm or gossip. If the information hits a sensitive spot, such as a latent fear of hypodermic needles, contamination or public speaking, and also presents a tidy way to dispatch those anxieties, so much the better.
Somewhere along the way, though, another research team has repeated the experiment and couldn’t get the same results. But a non-finding is usually non-newsworthy, and in any case, the surprising “fact” has already made a dent on our collective psyches. That’s just one way splashy findings like Ivermectin as a cure for COVID-19 continue to get attention, even after having been debunked.
Now, a fascinating new study out of University of California San Diego, led by behavioural economists Uri Gneezy and Marta Serra-Garcia, shows that experiments that could not be repeated have a bigger influence over time than the ones that could. In other words, the more interesting and novel-sounding the idea, the more it is cited by other scientists and the media, and the less likely it can be replicated.
There seems to be a trade-off between the wow factor of a study and its ability to be reproduced – which should affect its credibility – but does not. In Nature and Science, two high-profile and high-status journals, the non-replicable papers were cited 300 times more than the replicable ones.
Published in May in the journal Science Advances, the Gneezy-Serra-Garcia study analyzed the findings from three massive replication projects, two of them led by psychology professor Brian Nosek, from the Center for Open Science at the University of Virginia, and the third directed by behavioural economist Colin Camerer at the California Institute of Technology. All three focused on social science experiments that have been published in highly reputable journals. Volunteer scientists from the same field then repeated the experiment and hoped to get the same results, much as a cookbook author might test her recipes by baking the same torte in a different kitchen, with different utensils, to see if it looked and tasted the same.
To test a study’s replicability alongside its popularity, Prof. Gneezy and Prof. Serra-Garcia matched each original study’s outcomes to its citations on Google Scholar, starting roughly in 2008 and ending in 2019. Although they didn’t cherry-pick the wow studies, they still found that just 39 per cent of psychology studies replicated, as did 61 per cent in economics, and 62 per cent of those published in Nature and Science.
Yet over a decade, studies that failed to replicate were cited 16 times more often a year than reproducible ones, with “no significant change” after it was shown that they couldn’t be replicated. “Papers that failed to replicate were cited much more than papers that were reproducible,” Prof. Gneezy told me. “If citations are just a proxy for how sexy an idea is, then the findings that are more interesting and get the most attention are the least likely to be true.”
Some of the social science findings that haven’t been replicated include one showing that gripping a pen between your teeth, thus forcing a smile, makes you feel happy. Only one out of 18 attempts by other labs could reproduce this effect (it only works if no one is watching, apparently). Not just facial expression but body posture, too, is supposed to elicit emotions, according to a well-known study of “power-posing.” The idea that striking a victory stance – legs braced, arms in a V – can boost one’s confidence and alter one’s hormone levels (not to mention attenuate one’s jitters before public speaking or a job interview) has become the poster child for the replication crisis, mainly because it was such an easy fix for the universal fear of failure. Even if the power-posing study didn’t replicate, it continues to top the charts in scientific citations; a TED talk on the topic has garnered 61 million views.
The notion of “stereotype threat” – which is a fancy way of saying that believing stereotypes about your gender, race or ethnic group becomes self-fulfilling prophecy – has also largely failed to replicate. For example, the belief that they are bad at math necessarily hampers girls’ math performance has been studied extensively and is now widely accepted. Yet systematic replications of the phenomenon can’t reproduce it. Still, stereotype threat has influenced educational policy, training programs and even admission practices at some universities.
Studies on growth mindset, implicit bias research and ego depletion have faltered in replications – thus showing their findings to be exquisitely sensitive to context and statistical nuance. To extend the cooking metaphor, these findings might be like soufflés that only rise when the weather is right. Or, they might simply be false. Either way, they persist in the scientific literature, in corporate and educational trends, and in the public imagination.
The Gneezy-Serra-Garcia study focused on social science. But the phenomenon of non-replicability has hit the worlds of natural science, economics and medicine as well. Here’s just a small sampling: A paper published in the journal Nature last year reported that a small inhibitory molecule could tamp down the formation of tau, a sign of Alzheimer’s disease. Though the study could not be reproduced, it has been cited 605 times in papers published by other scientists. Similarly, animal studies showing successful treatment of Type 1 diabetes based on combining two existing drugs, a neurotransmitter and a malaria medication, raised hopes when published in the journal Cell in 2017. Two other prestigious teams tried and failed to reproduce the results. Still, the research has been cited 238 times.
Other findings that didn’t pass muster include a 21 percentage-point boost for the soccer player who kicks first in penalty shots, published in 2010 in American Economic Review. The finding couldn’t be coaxed out of a larger sample two years later. Yet the first-mover advantage has been cited 483 times in research, not to mention innumerable soccer broadcasts.
Why is this happening? The researchers who authored the Science Advances study speculate that journal editors – perhaps unconsciously – might overlook methodological problems in studies with intuitive appeal. These findings may be surprising, or they might provide an easy fix to a complex problem. Ivermectin for COVID-19, power-posing, stereotype threat and first-mover advantage in soccer all fit both bills. Once published, such findings take on a life of their own, gaining currency and momentum. “Extraordinary claims require extraordinary evidence,” Carl Sagan, the American astronomist, said. When our expectations are high, the burden of proof should be high, as well.
Still, it’s likely an overstatement to call this a crisis. Enthusiasm for replication and attempts to confirm catchy findings have been growing, and that’s a good thing. Having studied the problem for years, Prof. Nosek, the executive director of the Center for Open Science, seems unconcerned.
“This is about the scientific community self-scrutinizing,” he said. “It’s not surprising that the things that push the boundaries fail to replicate. That’s how it should be. We should be trying things that are not likely to be true. Then we investigate to see if they are.”
Keep your Opinions sharp and informed. Get the Opinion newsletter. Sign up today.