Machine learning model has the potential to scale up replication research

An analysis of more than 14,000 psychology studies from a 20-year period explores the potential of a text-based machine learning model to predict the likelihood of replicability, or whether the findings can be reproduced by manual experiments. This study suggests a possible pathway to prioritize manual replications and advance replication research in a scalable way.

“A discipline-wide investigation of the replicability of psychology papers over the past two decades” is published in Proceedings of the National Academy of Sciences, from Yang Yang, assistant professor of information technology, analytics and operations at the University of Notre Dame’s Mendoza College of Business, along with lead author Wu Youyou from University College London and Brian Uzzi from Northwestern University.

Researchers have discovered that numerous findings in fields such as psychology don’t hold up when other researchers try to replicate them. Manual replication of studies is one of the important ways scientists build confidence in the scientific merit of results, helping to check the validity of previous research. However, conducting manual replication research is time-consuming and involves high opportunity costs for scientists. For example, individual replication studies in psychology can take an average of 300 days from the claim date to complete analysis. This necessitates research on estimating replicability in a scalable way.

In a previous study, the team created a text-based machine-learning model to assess the replicability of scientific studies by using their text information. In this paper, the model is further validated using an extended dataset of manual replication papers (papers’ replicability success has been manually validated).

They used it for this investigation on more than 14,000 psychology papers published in six top-tier psychology journals between 2000 and 2019. Focusing on those six subfields of psychology, the study examined how replicability varies with respect to different research methods, authors’ productivity and cumulative citation impact, the prestige of the institution behind the study and media coverage.

The team found no significant difference in a paper’s citations between replicated and non-replicated studies and that media attention of papers is negatively correlated with the machine-estimated likelihood of replication success.

“This makes sense, because previous findings have suggested that ‘surprising’ results from a study are negatively linked to replication success,” Yang said. “And normally, unexpected findings are more likely to receive media attention.”

When focusing on manual replication studies, the team further identified that there is no significant correlation between whether a study is manually replicated and how often it is cited, which is consistent with results from a previous study.

This suggests that non-replicated papers circulate through the literature as quickly as replicated papers.

“Similar observations were made when we looked at the correlation between machine-estimated replication likelihood and citation impact for those 14,000 papers,” Yang said.

The team says that citation number is weakly associated with replicability and might not be a good diagnostic estimator of a paper’s replicability.

“In a positive light, we found that replication success is positively related to the first author’s cumulative citation impact,” Yang said. “Consistently, there is also a positive correlation between machine estimated replicability likelihood and the authors’ cumulative citation impact. However, we found no statistically significant evidence that replicability success is related to authors’ institutional prestige.”

The team also found variations in replicability among different psychology subfields.

Research focused on personality had a 55 percent chance of replicating, while research in developmental psychology had only a 36 percent chance.

According to the study, “One explanation for this pattern is that developmental psychology focuses on children and life courses, two areas in which researchers face unique difficulties in collecting large samples under controllable circumstances.”

It also revealed that between 2010 and 2019, average replication scores increased.

Manual replications are still relied upon to determine whether a paper is replicated. The team’s work should help replicability scholars to speed up the manual replication process and better allocate resources.

Contact: Yang Yang, 574-631-6253, Yyang1@nd.edu