photo courtesy the National Eye Institute
We all like easy answers, yet the honest pursuit of the answers to tough questions is anything but easy.
The gap between these two truths recently came into sharp relief when 270 scientists tried to reproduce their colleagues’ published results.
The project, headed by Brian Nosek, a psychologist from the University of Virginia, focused on psychological studies published in 2008. In an article published in the journal “Science,” Nosek and his colleagues in the Reproducibility Project set out to address concerns about whether such studies demonstrated replicable results. “Scientific claims should not gain credence because of the status or authority of their originator but by the replicability of their supporting evidence,” the article summarizing their results begins.
The findings of the Reproducibility Project demonstrate why. Only 36 percent of the nearly 100 studies re-examined reported statistically significant results, compared to 97 percent that reported such results in the original study. As Ed Yong observed writing for The Atlantic, this number is somewhat hard to interpret; it is not immediately evident whether this reflects natural variability, bias on the part of the scientists, sloppy work by journals, all of these factors or none of them. What it does make clear, however, is that journalists and their audience should be careful about putting too much stock in the results of a study without considering its methodology.
But since this principle is apt to be breached at least as often as it is observed, we also should try to tighten up the way peer-reviewed studies are conducted. One promising approach is “preregistration:” that is, requiring researchers to spell out the hypothesis and research parameters before conducting a study, not after. Nosek explained that this requirement helps prevent “passing off an exploratory study as a confirmatory one.” Because many journals set a threshold value for statistical significance (called a p-value), the temptation is present for scientists to choose their parameters in order to create publishable results. So-called “p-hacking” can range from unconscious human biases to actual fraud. This is one reason why reproducibility is so important. Performing a good study is hard, even when researchers intend to do honest work; the pressure to publish can push some researchers even farther, crossing the line into manipulation.
Some of the coverage has been harsh on psychology itself, suggesting the field is simply not as rigorous as branches of the “harder” sciences. Yet similar concerns about reproducibility have been raised in other disciplines in recent years, especially in genetics and clinical medicine. Psychology is certainly not alone in studies that fall prey to p-hacking, publication bias toward positive results or misconduct. A new example arose just yesterday, when news broke that researchers at Amgen Inc. had to retract an article published in the journal “Cell Metabolism” after the biotech firm discovered some experimental data had been manipulated to appear stronger. While sciences that are more directly subject to the laws of math and physics may have some extra built-in safeguards, problems in a variety of disciplines have led to a recent rise in scientific journal retractions and, in some instances, skepticism of the scientific process itself.
Yet as Christie Aschwanden wrote at FiveThirtyEight, “Science isn’t broken, nor is it untrustworthy. It’s just more difficult than most of us realize.” What that means is that it is important to approach the deluge of findings - whether in psychology, clinical medicine or elsewhere - with a dose of critical thinking and healthy skepticism. We are all consumers of scientific and statistical information, after all, and it is as important to read critically in science reporting as it is in any other area.
For instance, consider a doctor who diagnoses a 90-year-old woman with osteoporosis. The doctor recommends a bone-building medication with potentially significant side effects. The patient and her family then have to ask themselves a few questions: How many 90-year-old females would not qualify for a diagnosis of osteoporosis? Probably very few. To what extent have drug manufacturers tested the side effects and benefits of the suggested medication on a population of such advanced age? Probably not much. And chances are that, unless the doctor happens to be a gerontologist, the doctor hasn’t deeply considered the issue from that perspective.
If that seems unlikely to you, consider Addyi, about which I have written previously. The Food and Drug Administration reported that the manufacturer’s study on the interaction of the medication with alcohol included only 25 subjects - 23 of whom were men, despite the fact that the drug was designed exclusively for treating female patients. While the FDA is requiring three more studies as part of Addyi’s approval, those studies need not be completed before the drug goes on sale.
Whether or not the 90-year-old patient and her family choose to follow the doctor’s recommendation, their questions are legitimate. But instead of treating science as an ever-evolving search for the truth, we too often treat it as a monolithic entity in which we should simply trust what is presented to us without asking for more context or substantiation. In reality, studies vary widely in quality, and all of us would do well to keep it in mind.
The Reproducibility Project is a reminder that we shouldn’t blindly accept everything we hear or read, especially about studies where the results and conclusions haven’t been confirmed both by experts in the field who replicate the findings and by measurable real-world results.