Skip to main content

“Punch Card” DNA Could Mean Cheaper High-Capacity Data Storage

The new method may be faster and easier than other genetic storage attempts

Living cells already store information in DNA.

If everyone had to rely on flash memory—the data-storage system used in memory cards and thumb drives—the amount of information that the world is estimated to produce by 2040 would exceed the planet’s expected supply of microchip-grade silicon by up to 100 times. To prevent such a crisis, researchers have been exploring a storage material that life itself relies on: DNA.

In theory, this substance can hold a vast amount of information—up to one exabyte (one billion gigabytes) per cubic millimeter of DNA—for millennia. (The magnetic tape that serves as the foundation of most digital archives has a maximum life span of about 30 years, but DNA in 700,000-year-old fossils can still be sequenced.) One obstacle to making DNA data storage a reality, however, is the slow, expensive and error-prone process of creating, or synthesizing, new DNA sequences that fit a desired code.

“Synthesizing DNA is a major bottleneck with respect to recording cost, accuracy and writing speed,” says Olgica Milenkovic, a coding theorist at the University of Illinois at Urbana-Champaign and co-senior author of a new study on the topic. She and her colleagues have suggested a novel solution: instead of custom-synthesizing DNA from scratch, mark existing DNA molecules with patterns of “nicks” to encode data. This method was inspired by punch cards—strips of stiff paper that were punched with holes in specific positions to store information for many early computers, including the World War II–era ENIAC. The researchers detailed their technique on Wednesday in Nature Communications.


On supporting science journalism

If you're enjoying this article, consider supporting our award-winning journalism by subscribing. By purchasing a subscription you are helping to ensure the future of impactful stories about the discoveries and ideas shaping our world today.


Previous DNA storage approaches treated the four key DNA components known as bases—adenine, thymine, cytosine and guanine—like electronic bits, the 1s and 0s that encode digital data. For instance, each base might be assigned to represent the pair 00, 01, 10 or 11. But instead of translating a series of bits into DNA code and synthesizing corresponding strings of bases, the new method treats existing genetic material a little like the paper of those early punch cards. It applies enzymes as “the device that makes holes,” says lead study author S. Kasra Tabatabaei, a synthetic biologist at Urbana-Champaign. In this case, the “holes” are severed bonds between the molecules that make up the backbone of the DNA. The presence of this mark means 1, and its absence symbolizes 0.

The most interesting aspect of this research is how it relies on nature, says Brenda Rubenstein, a theoretical chemist at Brown University, who did not participate in the study. The researchers “let these enzymes create nicks—do what’s most natural to them—to store information,” she says.

To place the nicks precisely, the team heated double-stranded DNA molecules—picture each as a twisted ladder with rungs made of pairs of bases, and vertical rails of sugars and phosphates—until they unwound a bit in the middle. This process essentially formed bubbles that left the bases exposed. Next the scientists deployed single-stranded DNA molecules, each only 16 bases long, that latched onto corresponding sequences of bases within those bubbles. The ends of these single-stranded molecules served as guides, telling enzymes exactly where to go. In DNA, each base connects to a sugar molecule and a phosphate group to form a compound known as a nucleotide. The enzymes used in the new technique sever the bond linking one nucleotide to another to create a nick in the sugar-phosphate rails.

Because this method does not require synthesizing precise sequences of DNA, the researchers say one of its key advantages is that they can treat virtually any DNA molecule like a punch card. For instance, they experimented with genetic material harvested cheaply from readily available strains of Escherichia coli bacteria, whose sequences researchers know with great precision. Using bacterial DNA strands with 450 base pairs, each containing five to 10 nicks, the scientists encoded the 272 words of Abraham Lincoln’s Gettysburg Address—and a 14-kilobyte image of the Lincoln Memorial. After they placed this information on the DNA, they used commercial sequencing techniques to read the files with perfect accuracy.

“For many years, people thought molecular computing involved taking what we do in silicon and mapping that onto molecules, which resulted in these elaborate Rube Goldberg devices,” Rubenstein says. “Instead this new work trusted in how enzymes evolved over millions and millions of years to be incredibly efficient at what they do.”

The scientists hope their process may prove far cheaper and faster than those that rely on synthesizing DNA. They say, however, that DNA data-holding strategies proposed in the past still offer some advantages—such as roughly 12 to 50 times greater storage density than the punch-card technique. Still “the biggest problem with DNA data storage right now isn’t density; it’s cost,” Milenkovic says. “And our costs are really low and can be made even lower.” Moreover, she adds, older DNA storage systems have had to include redundant sequences, which serve as insurance against the error-prone nature of conventional DNA synthesis. This requirement reduces the amount of data they can actually hold, shrinking the storage-density gap between them and the new technique.