All files, photos and data of everyone in the world can theoretically be stored in a coffee mug full of DNA.
There are currently about 10 trillion gigabytes of digital data on Earth. Every day, an additional 2.5 million gigabytes of data are added in the form of e-mails, photos, tweets and other digital files. Much of this data is stored in huge, energy-consuming and very expensive data centers. But many scientists believe in an alternative solution to this problem. And that solution lies in the molecule that contains our genetic information: DNA.
DNA as a storage medium
DNA holds great promise as a future storage medium. It is a lot more efficient than today’s data centers, which require an acre of land and cost about $1 billion to build and maintain. In addition, it lasts much longer and DNA molecules can store information millions of times more compactly. “We need new solutions for storing the vast amounts of data the world collects, especially the archive data,” said researcher Mark Bathe in an interview with Scientias.nl. “DNA is the most natural choice because it already stores all the information on the planet about our plants, microbes and animals. What’s more, DNA is a thousand times more compact than flash memory, lasts forever if stored properly and – most importantly – it consumes no energy. In that respect, it is also a great solution for the climate, as current data centers consume enormous amounts of energy to store all our photos, films and data. And that while we never really look back at the vast majority of this data.”
How exactly does it work?
According to Bathe, all files, photos and data of everyone in the world could theoretically be stored in a coffee mug full of DNA. A bizarre thought. But how exactly does that work? “In the same way that DNA stores information about our genes — such as our ancestry, color of our eyes and hair, etc. — it can also be used to store other information such as text, movies, images and sound,” Bathe explains. from. The process basically involves converting strings of ones and zeros in digital data into the four basic bricks of DNA sequences — adenine, guanine, cytosine, and thymine. “For example, A and T could be used for the zeros, while Gs and Cs account for the ones,” Bathe said.
Needle in a haystack
It sounds like a great alternative, which is also safe and feasible. DNA is extremely stable and fairly easy to synthesize and sequence. Because of its high density — each nucleotide, equal to up to two bits, is about 1 cubic nanometer — an exabyte of data stored as DNA could fit in the palm of your hand. However, there are still a number of challenges to overcome. “First, it’s incredibly expensive because the technology is quite old and was never developed to fabricate massive amounts of DNA,” Bathe says. Currently, it would cost $1 trillion to store one petabyte of data (1 million gigabytes) in DNA. “In addition, retrieving stored files from DNA is also not easy,” Bathe continues. “It’s literally like looking for a needle in a haystack.”
red car
Meanwhile, scientists have managed to encode images and pages of text as DNA. But what if you only want to see a photo with a red car on it? How do you find that one photo out of a billion or more other images, like when we search Google for images of red cars? “When you want to find a needle in a haystack, you can manually sort through all the hay by sifting through it with your hands,” Bathe begins. “That’s like manually scrolling through every book in the library, assuming the entire collection of books is in a huge pile. But you could also organize the hay and books into domains and use index cards to look up things. Or, even better, you could tag any file like we do HTML pages. Meta tags then tell us the contents of each file. This is very quick and easy. Moreover, this is also the way Internet searches work, with the help of sophisticated algorithms.”
Tagging
Bathe and his colleagues now have in a new study devised an effective method for choosing the desired file from a mixture of many pieces of DNA. They did this by encapsulating each data file in a 6 micrometer particle of silica, which is labeled with short DNA sequences that reveal its contents. Each capsule is thus labeled with a ‘barcode’ that corresponds to the contents of the file, such as ‘red car’ or ‘cat’. Using this approach, the researchers showed that they can accurately extract individual images stored as DNA sequences from a set of 20 images. How? When the researchers wanted to retrieve a specific image, they removed a sample of DNA and added primers that matched the labels they were looking for — for example, “cat,” “orange” and “wild” for an image of a tiger. The primers are equipped with fluorescent or magnetic particles, making them easier to extract and identify from the sample. In this way, the desired file can be extracted from the DNA, while the rest of the DNA remains intact.
For the barcode, the researchers used single-stranded DNA sequences from a library of 100,000 sequences, each about 25 nucleotides in length. If you put two of these labels on each file, you can uniquely label 10 billion different files. With four labels on each file, you can uniquely label 10^20 files.
At the moment, the scientists are achieving a search speed of about 1 kilobyte per second. This search speed is determined by the data size per capsule. And that is currently limited by the prohibitive prices associated with storing even 100 megabytes of data. “To compete with Blu-ray discs or magnetic tapes, the cost of DNA synthesis has to drop by about six orders of magnitude (10^6, ed.),” says Bathe. “Many companies and labs are currently working on making it cheaper. In any case, once the methods are cheap enough, we’ve come up with a system that allows you to retrieve any file you want from a huge storage database that can be basically an exabyte or even a petabyte in size. The file itself can also be any reasonable size, such as a gigabyte, megabyte, or just a few kilobytes, because it’s a very general procedure.”
Bathe claims that the invention may be particularly useful for storing so-called ‘cold’ data; data that is kept in an archive but is not often consulted. But that is still future music. First, DNA synthesis will have to be about a million times cheaper before we can really start using the system for storing and retrieving data. However, according to the researcher, we do not have to wait long for innovations in the field. “Within a decade or two, costs will have fallen, similar to how the cost of storing information on flash drives has fallen dramatically over the past few decades,” he says. “And then I hope our solution will be a big step forward so that we never have to delete anything again.”
Source material:
“Could all your digital photos be stored as DNA?” – Massachusetts Institute of Technology
Interview with Mark Bathe
Image at the top of this article: Massachusetts Institute of Technology