DNA Data Storage Is Closer Than You Think

Life’s information-storage system is being adapted to handle massive amounts of information

Every minute in 2018, Google conducted 3.88 million searches, and people watched 4.33 million videos on YouTube, sent 159,362,760 e-mails, tweeted 473,000 times and posted 49,000 photos on Instagram, according to software company Domo. By 2020 an estimated 1.7 megabytes of data will be created per second per person globally, which translates to about 418 zettabytes in a single year (418 billion one-terabyte hard drive’s worth of information), assuming a world population of 7.8 billion. The magnetic or optical data-storage systems that currently hold this volume of 0s and 1s typically cannot last for more than a century, if that. Further, running data centers takes huge amounts of energy. In short, we are about to have a serious data-storage problem that will only become more severe over time.

An alternative to hard drives is progressing: DNA-based data storage. DNA—which consists of long chains of the nucleotides A, T, C and G—is life’s information-storage material. Data can be stored in the sequence of these letters, turning DNA into a new form of information technology. It is already routinely sequenced (read), synthesized (written to) and accurately copied with ease. DNA is also incredibly stable, as has been demonstrated by the complete genome sequencing of a fossil horse that lived more than 500,000 years ago. And storing it does not require much energy.

But it is the storage capacity that shines. DNA can accurately stow massive amounts of data at a density far exceeding that of electronic devices. The simple bacterium Escherichia coli, for instance, has a storage density of about 10¹⁹ bits per cubic centimeter, according to calculations published in 2016 in Nature Materials by George Church of Harvard University and his colleagues. At that density, all the world’s current storage needs for a year could be well met by a cube of DNA measuring about one meter on a side.

On supporting science journalism

If you're enjoying this article, consider supporting our award-winning journalism by subscribing. By purchasing a subscription you are helping to ensure the future of impactful stories about the discoveries and ideas shaping our world today.

The prospect of DNA data storage is not merely theoretical. In 2017, for instance, Church’s group at Harvard adopted CRISPR DNA-editing technology to record images of a human hand into the genome of E. coli, which were read out with higher than 90 percent accuracy. And researchers at the University of Washington and Microsoft Research have developed a fully automated system for writing, storing and reading data encoded in DNA. A number of companies, including Microsoft and Twist Bioscience, are working to advance DNA-storage technology.

Meanwhile DNA is already being used to manage data in a different way, by researchers who grapple with making sense of tremendous volumes of data. Recent advancements in next-generation sequencing techniques allow for billions of DNA sequences to be read easily and simultaneously. With this ability, investigators can employ bar coding—use of DNA sequences as molecular identification “tags”—to keep track of experimental results. DNA bar coding is now being used to dramatically accelerate the pace of research in fields such as chemical engineering, materials science and nanotechnology. At the Georgia Institute of Technology, for example, James E. Dahlman’s laboratory is rapidly identifying safer gene therapies; others are figuring out how to combat drug resistance and prevent cancer metastasis.

Among the challenges to making DNA data storage commonplace are the costs and speed of reading and writing DNA, which need to drop even further if the approach is to compete with electronic storage. Even if DNA does not become a ubiquitous storage material, it will almost certainly be used for generating information at entirely new scales and preserving certain types of data over the long term.