April 17, 2018

YOU PROBABLY THOUGHT THE SCIENCE WAS SETTLED: How Bad Is the Government’s Science? Policy makers often cite research to justify their rules, but many of those studies wouldn’t replicate.

Half the results published in peer-reviewed scientific journals are probably wrong. John Ioannidis, now a professor of medicine at Stanford, made headlines with that claim in 2005. Since then, researchers have confirmed his skepticism by trying—and often failing—to reproduce many influential journal articles. Slowly, scientists are internalizing the lessons of this irreproducibility crisis. But what about government, which has been making policy for generations without confirming that the science behind it is valid?

The biggest newsmakers in the crisis have involved psychology. Consider three findings: Striking a “power pose” can improve a person’s hormone balance and increase tolerance for risk. Invoking a negative stereotype, such as by telling black test-takers that an exam measures intelligence, can measurably degrade performance. Playing a sorting game that involves quickly pairing faces (black or white) with bad and good words (“happy” or “death”) can reveal “implicit bias” and predict discrimination.

All three of these results received massive media attention, but independent researchers haven’t been able to reproduce any of them properly. It seems as if there’s no end of “scientific truths” that just aren’t so. For a 2015 article in Science, independent researchers tried to replicate 100 prominent psychology studies and succeeded with only 39% of them.

Further from the spotlight is a lot of equally flawed research that is often more consequential. In 2012 the biotechnology firm Amgen tried to reproduce 53 “landmark” studies in hematology and oncology. The company could only replicate six. Are doctors basing serious decisions about medical treatment on the rest? Consider the financial costs, too. A 2015 study estimated that American researchers spend $28 billion a year on irreproducible preclinical research.

The chief cause of irreproducibility may be that scientists, whether wittingly or not, are fishing fake statistical significance out of noisy data. If a researcher looks long enough, he can turn any fluke correlation into a seemingly positive result. But other factors compound the problem: Scientists can make arbitrary decisions about research techniques, even changing procedures partway through an experiment. They are susceptible to groupthink and aren’t as skeptical of results that fit their biases. Negative results typically go into the file drawer. Exciting new findings are a route to tenure and fame, and there’s little reward for replication studies. . . .

A deeper issue is that the irreproducibility crisis has remained largely invisible to the general public and policy makers. That’s a problem given how often the government relies on supposed scientific findings to inform its decisions. Every year the U.S. adds more laws and regulations that could be based on nothing more than statistical manipulations.

All government agencies should review the scientific justifications for their policies and regulations to ensure they meet strict reproducibility standards. The economics research that steers decisions at the Federal Reserve and the Treasury Department needs to be rechecked. The social psychology that informs education policy could be entirely irreproducible. The whole discipline of climate science is a farrago of unreliable statistics, arbitrary research techniques and politicized groupthink.

The process of policy-making also needs to be overhauled. Federal agencies that give out research grants should immediately adopt the NIH’s new standards for funding reproducible research. Congress should pass a law—call it the Reproducible Science Reform Act—to ensure that all future regulations are based on similar high standards.

Each scientific discipline needs to accept responsibility for its share of the irreproducibility crisis and incorporate strict standards into its procedures.

There won’t be changes in behavior unless there are changes in incentives.