Exactly 20 years ago, the human genome project completed the decoding of the human genome. An international research consortium has now taken an important next step: They have created 64 reference genomes of humans, which for the first time show larger structural variants and the differences between the maternal and paternal genetic material of a person. Because the DNA samples for these reference genomes also come from 25 populations on different continents, they open up a completely new view of the genetic diversity of humankind.
On February 12, 2001 the time had come: after more than ten years of work, the scientists of the Human Genome Project published the first decoded version of the human genome. Based on fragments of various DNA samples, this sequencing showed the sequence of the DNA bases in the human genetic blueprint, which is around 3.2 billion “letters” long. Since then, DNA analyzes and comparisons have become the standard tool in many areas of science. They help track down disease genes, reconstruct the origins and migration movements of different populations or develop new gene therapies.
Long sections instead of short fragments
However, there has been a gap in this research so far: Due to methodological restrictions, the genomes had to be divided into short fragments for these comparisons, these had to be multiplied and then individually analyzed and put back together in the correct order by comparing them with reference genomes. Although these methods are well suited to detecting differences in just one DNA base or short sections, they can hardly be used to investigate other defining structural variants of the genome. So base sequences with many repetitions fall out because they are difficult to assign, and changes in longer sections, so-called structural variants, can hardly be identified with classic short-strand sequencing. However, they in particular have a significant influence on gene function and can also help characterize individual and population genetic differences. “The first human genome sequence was a big step forward, but it was incomplete,” said co-author Charles Lee of the Jackson Laboratory for Genomic Medicine in the United States. “In addition to the variation of individual bases, we now know that structural variants also contribute significantly to the genomic differences between individuals.”
For this reason, an international research consortium led by Peter Ebert from Heinrich Heine University Düsseldorf has now used the latest advances in sequencing technology to create new reference genomes. The DNA for this comes from 32 people from different parts of the world, who belong to 25 different population groups from Africa, North America, East and South Asia and Europe. For their project, the researchers subjected the genome to a so-called long-read genome analysis. At the same time, however, they sequenced the paternal and maternal genetic makeup of each person separately. Because in every cell we carry 23 pairs of chromosomes and in each pair one chromosome comes from the father and one from the mother. “For each human individual who took part in the study, we identified not one, but two genomes – one for each set of chromosomes,” explains Jan Korbel from the European Molecular Biology Laboratory (EMBL).
More than 100,000 structure variants
The result is 64 reference genomes, which for the first time allow a more comprehensive view of the genetic differences between the various human populations, but also between individuals and even the gene components within a person. “With these new reference data, genetic differences can be investigated against the background of global genetic variation with previously unattainable accuracy,” says Ebert. In the first comparisons, the team identified a good 107,500 structural variants, 68 percent of which were previously undetected. In addition, 316 sections with the reverse sequence of bases were found, 2.3 million places where DNA pieces were missing or inserted and around 15.8 million single nucleotide variants (SNV) – DNA positions at which one base was replaced by another due to a mutation is. This represents the most diverse set of human reference genomes ever compiled and captures the genetic diversity of human species better than ever.
Specifically, the new data can, for example, significantly improve the genome-wide association study method, which is often used to search for disease genes. This involves comparing genetic variants across the entire genome to find out whether certain variants are associated with certain traits or diseases. “Capturing the full range of structural variations found in human genomes is critical for clinical applications,” said co-author Qihui Zhu of the Jackson Laboratory. “These variants affect how genes function and can contribute to disease, differences in drug response, and more. Knowing how they differ in individuals and in different populations is necessary in order to implement more effective genomic medicine. “
The reference genomes also make it easier to carry out population-specific comparisons and to draw even more precise conclusions about the past and development of different populations. “These genomes will pave the way for a new wave of scientific discoveries about the biology of the human genome and the relationship between genetic variation and disease,” says co-author
Bernardo Rodriguez-Martin from EMBL.
Source: Peter Ebert (Heinrich Heine University Düsseldorf) et al., Science, doi: 10.1126 / science.abf7117