Finally, we now have a complete picture of the human genome. A milestone!
Genomes contain a wealth of information. And so scientists are trying to crack the genetic code of animals and vegetables, but of course also of humans. About twenty years ago we were almost there; scientists managed to decipher about 92 percent of the human genome. But now researchers come with big news. Because the last eight percent has now also been mapped.
You can think of the genome as a thick book written in a DNA alphabet of four letters: A, T, G and C. The chromosomes are the chapters and the genes are the words. Among the functional genes you will also find a lot of nonsense as if a cat walked over the keyboard while writing this book. The goal of genome scientists is to put all the words and nonsense in the right order. In addition, each paragraph must be linked to the correct chapter (chromosome). This doesn’t seem like such a difficult task, were it not for the fact that the text – in the case of the human genome – is about 6 billion letters long. Get started.
Deciphering the human genome is a hell of a job. That’s because the human genome consists of just over six billion individual DNA letters, spread over 23 pairs of chromosomes (see box). To be able to read a genome, scientists first cut all that DNA into pieces that are hundreds to thousands of letters long. Sequencing machines then read the individual letters in each piece. Scientists then try to put the pieces together in the correct order, like putting together a complicated puzzle.
Repetition
A major challenge, however, is that some parts of the genome repeat the same letters over and over. Repetitive regions include the centromeres (the parts that hold the two strands of chromosomes together and play a critical role in cell division) and ribosomal DNA (the molecular complexes that drive protein synthesis). Still other repeating parts include new genes that can help species adapt. In the human genome we find millions of repetitive DNA sequences of 300 letters, scattered throughout the genome. In the past, all that repetition made it impossible to put a few chopped pieces together in the correct order. It’s like having identical puzzle pieces – scientists didn’t know which went where, leaving large gaps in the genome picture.
eight percent
Still, researchers managed to come a long way in 2001. As mentioned, this original human genome sequence left out only eight percent of the DNA. There was a good reason why this eight percent could not be deciphered any further. Scientists knew that these missing chunks contained nearly identical duplications and highly repetitive sequences. “These are important regions, but difficult to rank,” said researcher Megan Dennis.
Leap Forward
However, new techniques now make it possible to decipher that last eight percent as well – which is comparable in size to an entire chromosome. That’s because previous DNA sequencing technology could only read relatively short sequences. “Newer generations of sequencers can decode much longer stretches, up to a million base pairs or ‘letters’ of DNA,” Dennis said. “That means that the chunks are much larger and therefore easier to assemble in the original order. This is a real one game changer†
Complete human genome
Researchers have finally got it right. In the new study, they describe the entire human genome; and therefore also the tricky bits that scientists previously did not dare to burn their fingers. The new reference genome comes from a single human sample — but not a real person. Researchers generated the entire genome sequence using a special cell line that has two identical copies of each chromosome — unlike most human cells, which carry two slightly different copies. Most of the newly added DNA sequences are located near the repetitive telomeres (long, trailing ends of each chromosome) and the aforementioned centromeres (dense mid-sections of each chromosome). “In the genetic manuscript, we see chapters that have never been read before,” said study researcher Evan Eichler. Or, as geneticist Robert Waterston of the University of Washington puts it, “hidden or unknown bits no longer exist.”
New insights
What is striking about the human genome, among other things, is that a considerable amount of human genetic material consists of long, repeating sections, which occur again and again. While every human has some reps, not everyone has the same number. And the difference in the number of repeats is where most of the human genetic variation is found. But that’s not the only new insight. It is one of the many important findings of the project. “It was worth the wait,” said Francis Collins, an American physician and geneticist. “The study reveals a rich array of surprising architectural features, with major implications for understanding human evolution, variation and biological function.”
All in all, the findings are a huge milestone. “Since we had the first concept sequence of the human genome, it has been a challenge to determine the exact order of complex genomic regions,” says Eichler. “I’m really glad we got the job done. This complete view of the human genome will revolutionize the way we think about human genomic variation, disease and evolution.”
Source material:
†New human reference genome opens unexplored regions– University of California – Davis (via EurekAlert)
†Complete human genome deciphered for the first time– Howard Hughes Medical Institute (via EurekAlert)
†Repeats are key to understanding humanity’s genome– University of Connecticut (via EurekAlert)
Image at the top of this article: Pete Linforth via Pixabay