Google DeepMind’s new AI model AlphaGenome can predict the function of long DNA sequences of up to a million base pairs. It combines numerous capabilities that previously required specialized models: using the DNA sequence, it predicts the influence of different areas of the genome, how the RNA copied from the DNA is further processed and what the effects of genetic variants are. The deep learning model can therefore make an important contribution to basic research and also help to better understand genetic diseases and possibly develop new treatment methods.
Our genome provides the blueprint and instructions for every cell in our body. In addition to the regions that code for proteins, it contains around 98 percent non-coding sections, which, however, play an important role in determining which genes are read. However, these regulatory sections are often located on the DNA strand far away from the regions they influence. This is why it is often difficult to find out the connections using conventional methods. Even previous AI models can only capture very limited sections of the genome at once.
Comprehensive analysis of genomic data
A team led by Žiga Avsec from Google DeepMind in London has now developed a new AI model called AlphaGenome that can analyze long stretches of DNA of up to a million base pairs, achieving resolution down to one base pair. It can determine eleven molecular properties at the same time: “AlphaGenome predicts, among other things, where genes begin and end in different cell types and tissues, where they are spliced, the amount of RNA produced and also which DNA bases are accessible, are close to each other or are bound to certain proteins,” explain the researchers.
The deep learning model was trained using publicly available genomic data from humans and mice. Using the experimentally validated information, AlphaGenome has learned how known DNA sequences influence various biological processes in a variety of cell types. On this basis, AI can also predict the impact of genetic variants – an important step towards better understanding hereditary diseases and possibly finding new approaches to treatment.
Avsec and its colleagues tested AlphaGenome against the leading AI models for predicting individual DNA sequences and regulatory effects, including specialized models that cover only a subset of functions. “In 25 of 26 tests predicting the impact of genetic variants, AlphaGenome matched or exceeded the performance of the best available external models,” the team reports. Because the new AI model combines the functions of numerous specialized individual models, it could help researchers generate and test hypotheses more quickly.
Free to use for the scientific community
From the perspective of Christian Schaaf, Director of the Institute for Human Genetics at Heidelberg University Hospital, who was not involved in the study, the fact that a private company has developed such a central model is ambivalent: “On the one hand, it accelerates innovation, but on the other hand, it creates dependencies on proprietary models and access conditions,” he says. “The decisive factor will be the extent to which training data, model architecture and interfaces remain transparent and usable for academic research in the long term, and whether fair, open standards for evaluation, regulation and clinical application can be established.”
Google DeepMind already enables free use for non-commercial research via an API interface. In the future, the entire model will be published so that researchers can adapt and fine-tune it for their purposes. “We believe that AlphaGenome can be a valuable resource for the scientific community, helping scientists better understand the function of the genome and the biology of disease, and ultimately driving new biological discoveries and the development of new treatments,” Avsec and his team write.
Source: Žiga Avsec (Google DeepMind, London, UK) et al., Nature, doi: 10.1038/s41586-025-10014-0