A first human pangenome

A first human pangenome

A new reference pangenome combines the individual genomes of 47 people. © Darryl Leja/ NHGRI

A good 22 years ago, researchers decoded the human genome for the first time. But this DNA sequence was still incomplete and came from only a handful of individuals. Now the Human Pangenome Reference Consortium is presenting humanity’s first pangenome – a reference genome consisting of 47 complete individual genomes from people from different populations. The pangenome thus reflects for the first time the genetic diversity of the human species and also reveals millions of previously unknown structural variants. This milestone in genetics now makes it possible to research and recognize individual differences as well as pathological changes in the genetic material better than ever before.

A first important milestone in genetics was reached in 2001: scientists from the international Human Genome Project (HGP) presented the first sequenced version of large parts of the human genome. The DNA for this decoding, which lasted more than ten years, was provided by DNA samples from just a handful of people, with most of the genetic material coming from just one person. Nonetheless, this first reference genome ushered in a new era in medicine and genetics. Because now individual genetic analyzes could be compared with this reference and examined for anomalies.

However, this official reference genome, which has been supplemented several times since then, has some fundamental shortcomings: it is incomplete because around 210 million DNA base pairs are missing or could not be correctly assigned. This is mainly due to the fact that many sections of the genome consist of many almost identical copies and repeats. However, the sequencing methods that have been in use for a long time divided the genome into short fragments of just a few hundred bases long, which then had to be correctly reassembled afterwards. But when there are many similar fragments, they are difficult to classify. In addition, most of the reference genome, dubbed GRCh38, comes from just one haplotype—half the chromosome set from a single human. It doesn’t even begin to represent the genetic diversity of human populations. It’s a mosaic genome assembled from snippets of genes from multiple people—there’s probably not a single human cell on the planet that has exactly that DNA sequence.

It was only in 2022 that researchers from the Telomere-to-Telomere (T2T) consortium succeeded in sequencing the human genome completely and from one end of the chromosome to the other for the first time. The reference genome, dubbed T2T-CHM13, revealed the sequence of around 200 million previously unreadable bases and corrected thousands of structural errors in the previous reference genome. This was made possible by modern sequencing technologies that can decode much longer sections of DNA in one go. Advances in computer-aided compilation of the complete genome also contributed to this. However, the genetic material for this reference genome only comes from a single cell line and thus only from a single person. “But a genome is not enough to represent human diversity,” says Benedict Paten of the University of California at Santa Cruz.

Similarities and differences at a glance

The international team of the Human Pangenome Reference Consortium (HPRC) has therefore set itself the goal of creating a so-called pangenome – a reference genome that shows the different structural variants of the human genome for every point along the DNA. Where all humans have the same genes and have few differences, this reference genome resembles a single strand. On the other hand, in places with many variants, it fans out into parallel strand variants. The Human Pangenome Reference Consortium has now presented the first such pangenome based on 47 individuals of various origins. Unlike the previous reference genome, it contains the genetic information of all 46 chromosomes of the cell and distinguishes between the chromosome copies with the paternal and maternal gene sets. This is important, for example, to research the influence of different alleles of a gene. According to the consortium, the accuracy of the pangenome, which was compiled in parallel using several methods, is more than 99 percent and its completeness is also over 99 percent.

The resulting reference genome adds 1115 gene duplications and 119 million base pairs to the previously decoded genome. Around 90 million of these 119 million DNA base pairs are due to structural variants – sections of DNA that have been modified by repeated sequences, insertions, missing parts or reversals of DNA sequences. This increases the number of known structural variants by 104 percent. “Previous genomic studies gave the impression that our human genome is ‘flat’ and very similar in every human being, distinguished only by a handful of point mutations,” says Erik Garrison of the Memphis College of Medicine. “But the human pangenome now shows that each of us carries a little bit of DNA that is unusual or even unique.” . It has the potential to transform our view of the genetic diversity of our species.”

New look at genetic diseases

At the same time, the pangenome opens up new opportunities to better understand diseases and their genetic roots. “Since a large part of human diversity is to be mapped in the pangenome, it becomes easier to discover sequence segments in a person’s genome that are potentially linked to diseases,” explains Siegfried Schloissnig from the Research Institute of Molecular Pathology in , who is not involved in the project Vienna. “Sequence sections that occur in many individuals can be excluded in this way. Everything that is not yet contained in the pangenome then either represents the genomic individuality of a person or could be associated with a disease.”

But this milestone is just the beginning: The Human Pangenome Reference Consortium plans to sequence the genomes of a total of 350 people in this way and insert them into the pangenome. This new, even more comprehensive pangenome should be ready by 2024. “This is not the end of a project, but the beginning of a new era in which human diversity will be more fully integrated into the biological, biomedical and clinical sciences,” says consortium member Ting Wang from Washington University in St. Louis. “The new genome reference will continue to grow, expand and be tweaked to reveal the genetic blueprint of our species.”

Source: Benedict Paten (University of California, Santa Cruz) et al., Nature, doi: 10.1038/s41586-023-05896-x

Recent Articles

Related Stories