The past several years have been bruising ones for the credibility of the social sciences. A star social psychologist was caught fabricating data, leading to more than 50 retracted papers. A top journal published a study supporting the existence of ESP that was widely criticized. The journal Science pulled a political science paper on the effect of gay canvassers on voters’ behavior because of concerns about faked data.
Now, a painstaking yearslong effort to reproduce 100 studies published in three leading psychology journals has found that more than half of the findings did not hold up when retested. The analysis was done by research psychologists, many of whom volunteered their time to double-check what they considered important work. Their conclusions, reported Thursday in the journal Science, have confirmed the worst fears of scientists who have long worried that the field needed a strong correction.
The vetted studies were considered part of the core knowledge by which scientists understand the dynamics of personality, relationships, learning and memory. Therapists and educators rely on such findings to help guide decisions, and the fact that so many of the studies were called into question could sow doubt in the scientific underpinnings of their work.
“I think we knew or suspected that the literature had problems, but to see it so clearly, on such a large scale — it’s unprecedented,” said Jelte Wicherts, an associate professor in the department of methodology and statistics at Tilburg University in the Netherlands...
The new analysis, called the Reproducibility Project, found no evidence of fraud or that any original study was definitively false. Rather, it concluded that the evidence for most published findings was not nearly as strong as originally claimed.
Dr. John Ioannidis, a director of Stanford University’s Meta-Research Innovation Center, who once estimated that about half of published results across medicine were inflated or wrong, noted the proportion in psychology was even larger than he had thought. He said the problem could be even worse in other fields, including cell biology, economics, neuroscience, clinical medicine, and animal research.
The report appears at a time when the number of retractions of published papers is rising sharply in wide variety of disciplines. Scientists have pointed to a hypercompetitive culture across science that favors novel, sexy results and provides little incentive for researchers to replicate the findings of others, or for journals to publish studies that fail to find a splashy result.
“We see this is a call to action, both to the research community to do more replication, and to funders and journals to address the dysfunctional incentives,” said Brian Nosek, a psychology professor at the University of Virginia and executive director of the Center for Open Science, the nonprofit data-sharing service that coordinated the project published Thursday, in part with $250,000 from the Laura and John Arnold Foundation
Open Science Collaboration. Estimating the reproducibility of psychological science. Science 28 August 2015:Vol. 349 no. 6251
Reproducibility is a defining feature of science, but the extent to which it characterizes current research is unknown. We conducted replications of 100 experimental and correlational studies published in three psychology journals using high-powered designs and original materials when available. Replication effects were half the magnitude of original effects, representing a substantial decline. Ninety-seven percent of original studies had statistically significant results. Thirty-six percent of replications had statistically significant results; 47% of original effect sizes were in the 95% confidence interval of the replication effect size; 39% of effects were subjectively rated to have replicated the original result; and if no bias in original results is assumed, combining original and replication results left 68% with statistically significant effects. Correlational tests suggest that replication success was better predicted by the strength of original evidence than by characteristics of the original and replication teams.
Courtesy of a colleague