Certain mutations in our DNA sequence can cause proteins to be assembled differently than they should. While many of these mutations are harmless, others can cause serious illnesses. For most variants, however, their effects are still unclear. Now the Google subsidiary DeepMind has published the program AlphaMissense, which uses artificial intelligence to calculate the structure of the modified proteins and predict which mutations are potentially harmful. This could help uncover the causes of rare genetic diseases.
The DNA sequence of our genome provides the blueprint for our proteins. Mutations can change individual DNA building blocks. In the case of so-called missense mutations, such a change in the blueprint results in a different amino acid being incorporated into the affected protein. “Of the more than four million missense variants observed, only an estimated two percent have been clinically classified as pathogenic or benign, while the vast majority of them are of unknown clinical significance,” explains a team led by Jun Cheng from Google DeepMind in London. “This limits the diagnosis of rare diseases and the development or use of clinical treatments that target the underlying genetic cause.”
Further development of AlphaFold
Until now, however, it has been difficult to predict the potential effects of a missense variant. New technologies make it possible to record thousands of variant effects simultaneously using cell cultures and DNA sequencing. But results from such experiments are currently only available for a tiny part of the human genome. Jun Cheng and his team have therefore chosen a different approach that can be used on a much larger scale. They used the AI program AlphaFold, developed by DeepMind a few years ago, as a basis, which uses the protein sequence to predict exactly how the corresponding protein will fold, i.e. what structure it will adopt.
“We adapted AlphaFold to predict the pathogenicity of missense variants,” Cheng and his colleagues write. To do this, they combined AlphaFold’s structure predictions with information from clinical databases that contain information about already known missense mutations and their effects. They also included the frequency of certain variants in humans. “Machine learning can be used to identify and exploit patterns in biological data to infer the impact of previously unexplored variants,” explain the authors.
Pathological or benign?
The researchers calculated all possible variants of missense mutations for almost 20,000 human proteins – a total of 216 million possible changes to individual amino acids. This resulted in 71 million predictions of missense variants. “Using AlphaMissense, we classified 32 percent of these missense variants as potentially pathogenic and 57 percent as probably benign,” reports the team. AlphaMissense did not provide a clear assessment for eleven percent of the variants.
More detailed analyzes showed that variants in proteins that have changed little over the course of evolution were particularly often classified as pathogenic, as were variants that influence the stability of proteins. Comparisons of the results with missense mutations that have already been scientifically studied showed a high degree of agreement between the predictions of AlphaMissense and the effects actually observed. “AlphaMissense predictions have the potential to accelerate our understanding of the molecular effects of variants on protein function, contribute to the discovery of disease-causing genes, and increase the diagnostic yield of rare genetic diseases,” write Jun Cheng and his team.
So far only limited diagnostic use
In an accompanying commentary, also published in the journal Science, Joseph Marsh of the University of Edinburgh and Sarah Teichmann of the University of Cambridge, who were not involved in the study, write that AlphaMissense marks the beginning of a new phase in prediction Variant effects. However, they also point out that it is still unclear to what extent one can rely on purely mathematical predictions when diagnosing diseases.
“Although AlphaMissense’s classifications of likely pathogenic or likely benign are undoubtedly helpful in interpreting and prioritizing variants, these designations should not be confused with the very specific clinical definitions of these terms, which are based on multiple lines of evidence,” they write. It should also be noted that the effects of mutations in practice are very complex. Even if a person carries a pathogenic variant, it does not necessarily actually lead to illness. “But although we cannot currently rely solely on predictive models such as AlphaMissense for genetic diagnostics, their usefulness will continue to increase in the future as both the computational approaches and the strategies for their interpretation improve.”
Source: Jun Cheng (Google DeepMind, London, UK) et al., Science, doi: 10.1126/science.adg7492