First complete sequencing of the Y chromosome

First complete sequencing of the Y chromosome

The Y chromosome is the last human chromosome whose DNA code has been fully sequenced. © Darryl Leja/ National Human Genome Research Institute (NHGRI)

The male sex chromosome is not only the smallest and poorest human chromosome. Around half of its DNA code was previously unsequenced because the Y chromosome contains a particularly large number of repeated sections that are difficult to classify. Now, for the first time, an international research consortium has succeeded in completely decoding the Y chromosome from one end to the other. The sequencing includes all 62.46 million base pairs of the male sex chromosome - around 30 million of these DNA base pairs have only now been decoded. The scientists also newly identified 41 protein-coding genes. With the complete sequencing of the Y chromosome, the entire human genome has now been completely mapped.

The male Y chromosome is rather puny compared to the female X chromosome: in the course of evolution it has lost almost 90 percent of its genetic information. It is only a third the size and contains a fifth as many genes as its female counterpart. Nevertheless, this chromosome is important because its genes influence sex, the development of male sexual characteristics and sperm production in men. Scientists have also recently found that age-related loss of the Y chromosome from some cells in older men increases the risk of heart disease and inflammation. Despite its importance, however, the Y chromosome has been the only one that has not yet been fully sequenced. The female X chromosome, on the other hand, was already completely decoded in 2020, and the rest of the human genome was completely sequenced in 2022 – only the Y chromosome was still missing. Although the essential genes were known about him, around half of his DNA code remained undeciphered.

62.46 million base pairs of the Y chromosome sequenced

The reason for the gaps in the Y chromosome map is the complex structure of the male sex hormone, which is made up of an unusually large number of repeating sections. These include numerous sequences with consecutive multiple copies of the base code, but also palindromes – long stretches of DNA code that are copied exactly as mirror images of one another. The sequencing methods commonly used up to now did not allow correctly reading out such frequently copied or very similar sections, because they divide the genome into short fragments, only a few hundred bases long. These must then be reassembled in the correct order afterwards. But this is impossible when hundreds or thousands of these pieces are almost identical. However, thanks to advances in sequencing technology, the devices can now read segments tens of thousands to a million base pairs long. This makes it possible to decode even parts of the genome with many identical repetitions. In the meantime, artificial intelligence is also used to assemble the DNA fragments.

The scientists of the Telomere-to-Telomere (T2T) Consortium have now succeeded in completely sequencing the male sex chromosome using these methods. "The resulting map includes all 62.46 million base pairs of the Y chromosome with no gaps or model-added sequences," report Arang Rhie of the National Human Genome Research Institute (NHGRI) in Bethesda and his colleagues. The sequencing thus adds around 30 million base pairs to the previously known DNA sections of this chromosome. These mostly involve repeated sequences. "The biggest surprise was how ordered these repeats are," says co-author Adam Phillippy of NHGRI and leader of the T2T consortium. “We didn't know beforehand exactly how the missing sections are structured, they could have been very chaotic. Instead, nearly half of this chromosome is made up of alternating blocks of two specific, repeating sequences, also known as satellite DNA. They create a nice, quilt-like pattern.”

Sperm production and a surprise

The complete sequencing of the Y chromosome also reveals some peculiarities in medically important sections. One of them affects the so-called azoospermia factor region - a region that contains several genes important for sperm production. In this, the sequencing showed several palindromes. "This structure is very important because such palindromes can sometimes create loops in the DNA strand," explains Rhie. "If these loops are accidentally severed, it can lead to genetic defects in the genome." Such DNA losses in the azoospermia factor region are known to disrupt sperm production in affected men and cause infertility. Thanks to the now known sequences of this region, such losses can be identified more easily in the future.

Another medically important part of the Y chromosome is the so-called TSPY array. It includes the TSPY gene, which is also important for sperm production, and countless copies of it. Together they form the second largest copy complex of the entire human genome. The new sequencing now shows for the first time the exact base sequence of this TSPY array and also that the number of copies can vary between ten and 40 in different men. "When you find such previously unknown variants, there's always the hope that this will help to better understand human health," says Phillippy. "Medically relevant genome variants could also help us to develop better diagnostic methods in the future." Co-author Dylan Taylor from Johns Hopkins University in Baltimore takes a similar view: "Now that we have this 100 percent complete sequence of the Y chromosome, we can identify and explore numerous genetic variations that have previously eluded us that could influence human traits and diseases," Taylor said.

However, the complete decoding of the Y chromosome also brought another, completely unexpected discovery: when the researchers compared their newly created DNA sequences with those of a large database of bacterial genomes, they found a surprising number of similarities. More than 5,100 of the bacterial genomes included in the database contained short sections of the human Y chromosome code. "That was surprising," Rhie says. After decoding, bacterial genomes are routinely checked for possible contamination by human DNA. However, such tests can only detect human DNA snippets whose sequence is known – and that was not the case for large parts of the male sex chromosome. Only through the current sequencing of the gaps in the DNA map of the Y chromosome can these impurities now be identified.

Source: Telomere-to-Telomere Consortium, Nature, doi: 10.1038/s41586-023-06457-y

Recent Articles

Related Stories