How reproducible are social science studies?

How reproducible are social science studies?

Scientific studies should be reproducible, replicable and robust. © ispyfriend/ iStock

In science, reproducibility and replicability are important criteria for the credibility of results: If the same result comes out when checking a study or experiment, the results are probably correct. But now an international project is revealing that this is exactly what is lacking in the social sciences: of the around 3,900 studies analyzed from sociology, psychology, politics, economics and education, the results could only be successfully reproduced in around half. Many also lacked raw data and precise information for computer-aided evaluation. When the same data was checked using alternative analysis methods, some of the studies even came out with the opposite of the originally published core statement. What does this mean for scientific practice?

The credibility of the research rests on three criteria, which are also summarized as “the three Rs”. The first criterion is reproducibility: If the same raw data is re-evaluated using the same analysis methods, the same result should emerge. The second criterion is replicability: If the experiment is repeated in the same way by other research teams, they should also come to a consistent result. The third criterion is robustness: If you evaluate the same raw data using alternative analysis methods, you should still come to the same core conclusion. So much for the theory. However, in practice it has often been shown that attempts at such verification fail. It often happens that the results of certain publications cannot be understood.

The three R criteria of science

In recent years, the large-scale international project SCORE (Systematizing Confidence in Open Research and Evidence) has examined in more detail what the three R criteria look like in the social sciences. To do this, more than 850 researchers from all over the world tested the reproducibility, replicability and robustness of around 3,900 studies published in specialist journals between 2009 and 2018. These came from various fields of sociology, psychology and educational research, but also from economics, business and political science. SCORE participants have now published their results in three specialist articles. “The fact that these articles were published in a large, renowned journal like Nature shows that there is now a lot of focus on doing science properly,” says researcher Cristina Greculescu from the Bremen International Graduate School of Social Sciences (BIGSSS), who was involved in the project. “This is a positive signal that this work is important, that scientific integrity is important.”

To test the first criterion, reproducibility, a team led by Olivia Miske from the Center for Open Science in Charlottesville, USA, repeated the analyzes of 600 studies from various areas of social and economic sciences. It turned out that only just under a quarter of the publications contained enough raw data and information about the analysis methods and computer codes used to make them understandable. Of these, 53 percent proved to be reproducible, and another quarter were at least approximately reproducible. “The raw data was more often available for publications in political science and economics and could be successfully reproduced more often than in other disciplines,” report Miske and her colleagues. “This is likely due to the policies of journals in these disciplines that require sharing the raw data and codes for publication.”

Only half is robust and replicable

A team led by Balazs Aczel from Eötvös Loránd University in Budapest examined the robustness of study results using 100 studies. They used the same raw data but alternative analysis methods. Their result: “34 percent of the independent reanalyses delivered the same result as the original study,” they write. “If you expand the tolerance range, around 57 percent of the studies meet the criterion.” However, in 24 percent of the new analyses, the researchers came up with no significant results or even core statements to the contrary. A team led by Andrew Tyner from the Center for Open Science in Washington DC focused on the third and most complex criterion, replicability. They repeated the experiments and tests of 164 studies. The results of 55 percent of the studies proved to be replicable.

“These findings should serve as a wake-up call,” comments Stanford University sociologist Robb Miller in Nature. “If taken seriously, they could help build knowledge in the social sciences that is durable enough to withstand the test of time.” Those involved in the SCORE project see it similarly: “This should not be seen as a fundamental criticism of the social sciences as an unreliable field of research, that would simply be wrong,” emphasizes Ulrich Kühnen from the Constructor University in Bremen. The results of the project provided valuable information for improvement. “This is a self-critical moment for the social and behavioral sciences in general, but in a positive sense,” adds Greculescu. “It paves the way for greater integrity and more openness in the scientific endeavor.”

Source: SCORE Project, Nature

Recent Articles

Related Stories