21 years ago, the Human Genome Project decoded the human genome for the first time – but some crucial regions remained unmapped at the time. An international research consortium has now closed the gaps. For the first time, the DNA of all human chromosomes has been sequenced from tip to tip. As a result, the gene sections marked by countless repetitions in the center of the chromosomes and at their ends can now be read. This opens up new insights into the regulation of genes and cell division, into the range of variation in the human genome and into the causes of diseases.
The first sequencing of the human genome in 2001 was groundbreaking because it provided a reference for the approximately six million base pairs and around 25,000 protein-coding genes in our genome for the first time. However, this first reference genome was not complete, as it only comprised around 92 percent of the entire DNA sequence. In this version, a few million base positions in the genome are only marked with the letter “N” instead of one of the abbreviations for the four DNA bases. These regions, which have not yet been decoded, are mainly located in the so-called centromeres, the central nodes of the chromosomes, which are crucial for cell division. But there are also undeciphered areas at the ends of the chromosomes, the telomeres.
New sequencing technology closes gaps
One reason for these gaps is the limitations of the sequencing technologies used at the time: they split the genome into countless DNA fragments, each only around a hundred bases long. These must be reassembled later in the correct order. But this is impossible when hundreds or thousands of these pieces are almost identical – and this is exactly the case with the centromeres and telomeres of the chromosomes. The genome regions there consist of innumerable, often repeated DNA sequences. Reconstructing these based on short DNA fragments is like trying to put together a jigsaw puzzle from thousands of identically colored jigsaw pieces: “It’s as if, for example, you only have pieces from heaven,” explains Winston Timp from Johns Hopkins University, one of the participants in the Telomere-to-Telomere (T2T) consortium.
But in the meantime, the sequencing technology has made progress. Two new methods now make it possible to divide the genome into significantly longer sections. So-called Oxford nanopore sequencing can read stretches of DNA up to a million bases long, albeit with moderate accuracy. A second system from Pacific Biosciences creates segments that are around 20,000 bases long, but can read them with 99 percent precision. The scientists of the T2T consortium have now used both methods in combination to completely decode the missing sections of the human genome for the first time. The genetic material for this came from a human cell line in which, by a fortunate coincidence, all of the genetic material comes from only one parent. This means that the sister chromosomes are also identical, which makes sequencing easier.
New genes, new variants and the first look into the centromere
The result of the T2T project is now the first completely decoded human genome. The approximately 200 million bases that were missing so far – about as many as would be contained in an entire chromosome – have now been decoded. Among them are 99 previously unknown protein-coding genes and almost 2000 other gene candidates. The genome, dubbed T2T-CHM13, also corrects thousands of structural errors in the previous reference genome. “We are now seeing chapters in the book of life that we have never been able to read before,” says Evan Eichler of the University of Washington. “The complete blueprint of our genome will revolutionize our ideas of genetic variation, diseases and human evolution.” For example, many of the DNA sections that have now been supplemented cover gaps in gene regions whose variants are considered possible causes of diseases. “Now we can identify them because we have a more complete and accurate reference genome,” says Karen Miga of the University of California at Santa Cruz.
Also significant are the new insights into the structure of the centromeres, the junctions that hold the two halves of the chromosome together. They play a crucial role in meiosis, the meiotic division that separates these sister chromatids. “If this step of meiosis goes wrong, chromosomal abnormalities can occur that cause miscarriage or genetic diseases,” explains Nicolas Altemose of the University of California at Berkeley. Cancer can also be the result of such a dysregulated division. It is all the more important to know the code of the centromeres in order to be able to identify the causes of such anomalies. This is exactly what is now possible thanks to the new T2T reference genome. “Before, all we had was an extremely blurry picture of what was hiding there. But now it’s clear down to the individual DNA base,” says Altemose.
The first complete, complete decoding of the human genome is a significant milestone, comments Bob Waterston from the University of Washington, one of the collaborators on the first human genome project. “We would have loved to have done this 20 years ago, but the technology just wasn’t ready at the time.” But the work of the T2T consortium doesn’t end there: They are already working on creating a genome with the normal, to decode the set of chromosomes originating from both parents. They also want to sequence the genomes of people from different populations with similar accuracy and completeness – this could provide new insights into the similarities, differences and evolution of different human types.
Source: T2T Consortium, Science, doi: 10.1126/science.abp8653; doi: 10.1126/science.abj6987 Etc.