Artificial intelligence can write poems, help with the early detection of diseases and develop business ideas. It is also increasingly being used in science. Machine learning can be used to recognize patterns in huge data sets and create statistical models. But it is usually unclear how exactly the AI arrives at its results. This endangers the credibility and reproducibility of the research based on it. To counteract this problem, an interdisciplinary research team has developed guidelines for the responsible use of artificial intelligence in science.
A basic principle of good science is that the results can be understood and replicated by other researchers. This allows the quality of the results to be checked and makes it possible for other research groups to build on the findings. But this basic principle is being shaken by the spread of artificial intelligence. Machine learning models are now used in most scientific disciplines for a wide range of questions and make it possible to open up completely new data sources that were not accessible using classic statistical methods. But AI is often a black box: exactly how it arrives at its results remains unclear, and in many cases it is not possible to reproduce the results.
Cross-disciplinary checklist
A team led by Sayash Kapoor from Princeton University in New Jersey has now addressed this problem. “If we move from traditional statistical methods to machine learning methods, there are many more opportunities to shoot yourself in the foot,” says Kapoor’s colleague Arvind Narayanan. “If we do not intervene to improve our scientific standards and reporting standards when it comes to machine learning-based science, we risk not just one discipline, but many different scientific disciplines, one after the other, encountering these problems.”
In order to counteract the emerging credibility crisis in science, Kapoor, together with an interdisciplinary team of researchers from disciplines such as computer science, data science, mathematics, social sciences and biomedicine, has created a guideline that is intended to ensure good scientific practice when dealing with machine learning. “This is a systematic problem with systematic solutions,” says Kapoor. The result is a consensus-based checklist that researchers in various disciplines can use as a guide when using machine learning.
Transparency in the research process
The list includes 32 items in eight categories. This includes, for example, precisely describing the aim of the respective study and explaining why artificial intelligence is considered a suitable method for the respective question. In addition, all training data used as well as the model’s code and hardware should be disclosed in order to enable other research teams to reproduce the results. The selection of training and test data used as well as their limitations should also be presented and justified.
“This checklist can serve as a resource for researchers when planning and conducting a study, reviewers when reviewing papers, and journals when enforcing standards of transparency and reproducibility,” the team writes. The recommendations are designed in such a way that they are applicable to many different disciplines and research areas, but at the same time are specific enough to precisely address the problems of machine learning.
Promoting scientific progress
From the perspective of Kapoor and his team, the standards they propose can help make artificial intelligence-based science more reliable. Adhering to all standards may increase the time required for each individual study. However, since erroneous, non-reproducible and ultimately useless or even harmful studies are avoided, the overall pace of scientific progress could even increase.
“By ensuring that published work is of high quality and provides a solid foundation for future work, we may be able to accelerate the pace of scientific progress,” says Kapoor’s colleague Emily Cantrell. “We should focus on scientific progress itself and not just on the individual papers being published.”
Source: Sayash Kapoor (Princeton University, New Jersey, USA) et al., Science Advances, doi: 10.1126/sciadv.adk3452