I am a scientist specialized on human genetics and cell metabolism.
Current data storage tech can't keep up with all the data we generate today, so how about using DNA?
In 10 seconds? In today’s information Era, smartphones, social media and big data are generating more information than ever before, raising the question of how to store this data durably and efficiently. Surprisingly, DNA may be the answer.
Don’t believe it? According to Google’s CEO, Eric Schmidt, every two days we create as much data as we did from the dawn of civilization up until 2003, and world wide data is predicted to exceed 40 trillion gigabytes by 2020. Thus, how to handle all this data is an increasingly serious problem. But several research groups are trying to solve this problem by developing methods to efficiently store information in DNA.
But why DNA, you ask? Firstly, because of durability. While memory cards and chips are only sustainable for 5 years, and magnetic tapes for 10 to 30 years, DNA can be kept intact for thousands of years, under the right conditions. Indeed, more than 80% of the mammoth’s genome remains readable despite the fact that this species disappeared from the planet more than 10,000 years ago.
Secondly, because of density. Instead of using a binary code, so two possibilities (1 or 0), it has four possibilities per position (A, G, C and T). Also, it’s 3D structure offers higher memory space than the linear structure used in current data storing media. Additionally, epigenetic modifications allow multiple layers of information to be stored in a single DNA template. In fact, the latest method created, DNA Fountain, allows a density of 215 Petabyte/gram of DNA with perfect retrieval of the information.
And finally, because of secrecy. What better way to hide information than encoding it into the DNA of living organisms?
Abstract: DNA is an attractive medium to store digital information. Here we report a storage strategy, called DNA Fountain, that is highly robust and approaches the information capacity per nucleotide. Using our approach, we stored a full computer operating system, movie, and other files with a total of 2.14 × 10(6) bytes in DNA oligonucleotides and perfectly retrieved the information from a sequencing coverage equivalent to a single tile of Illumina sequencing. We also tested a process that can allow 2.18 × 10(15) retrievals using the original DNA sample and were able to perfectly decode the data. Finally, we explored the limit of our architecture in terms of bytes per molecule and obtained a perfect retrieval from a density of 215 petabytes per gram of DNA, orders of magnitude higher than previous reports.
Pub.: 04 Mar '17, Pinned: 20 Apr '17
Abstract: All synthetic DNA materials require prior programming of the building blocks of the oligonucleotide sequences. The development of a programmable microarray platform provides cost-effective and time-efficient solutions in the field of data storage using DNA. However, the scalability of the synthesis is not on par with the accelerating sequencing capacity. Here, we report on a new paradigm of generating genetic material (writing) using a degenerate oligonucleotide and optomechanical retrieval method that leverages sequencing (reading) throughput to generate the desired number of oligonucleotides. As a proof of concept, we demonstrate the feasibility of our concept in digital information storage in DNA. In simulation, the ability to store data is expected to exponentially increase with increase in degenerate space. The present study highlights the major framework change in conventional DNA writing paradigm as a sequencer itself can become a potential source of making genetic materials.
Pub.: 24 Nov '16, Pinned: 20 Apr '17
Abstract: With the exponential growth in the capacity of information generated and the emerging need for data to be stored for prolonged period of time, there emerges a need for a storage medium with high capacity, high storage density, and possibility to withstand extreme environmental conditions. DNA emerges as the prospective medium for data storage with its striking features. Diverse encoding models for reading and writing data onto DNA, codes for encrypting data which addresses issues of error generation, and approaches for developing codons and storage styles have been developed over the recent past. DNA has been identified as a potential medium for secret writing, which achieves the way towards DNA cryptography and stenography. DNA utilized as an organic memory device along with big data storage and analytics in DNA has paved the way towards DNA computing for solving computational problems. This paper critically analyzes the various methods used for encoding and encrypting data onto DNA while identifying the advantages and capability of every scheme to overcome the drawbacks identified priorly. Cryptography and stenography techniques have been analyzed in a critical approach while identifying the limitations of each method. This paper also identifies the advantages and limitations of DNA as a memory device and memory applications.
Pub.: 01 Oct '16, Pinned: 20 Apr '17
Abstract: Biopolymers are an attractive alternative to store and circulate information. DNA, for example, combines remarkable longevity with high data storage densities and has been demonstrated as a means for preserving digital information. Inspired by the dynamic, biological regulation of (epi)genetic information, we herein present how binary data can undergo controlled changes when encoded in synthetic DNA strands. By exploiting differential kinetics of hydrolytic deamination reactions of cytosine and its naturally occurring derivatives, we demonstrate how multiple layers of information can be stored in a single DNA template. Moreover, we show that controlled redox reactions allow for interconversion of these DNA‐encoded layers of information. Overall, such interlacing of multiple messages on synthetic DNA libraries showcases the potential of chemical reactions to manipulate digital information on (bio)polymers.
Pub.: 21 Jul '16, Pinned: 20 Apr '17
Abstract: We describe the first DNA-based storage architecture that enables random access to data blocks and rewriting of information stored at arbitrary locations within the blocks. The newly developed architecture overcomes drawbacks of existing read-only methods that require decoding the whole file in order to read one data fragment. Our system is based on new constrained coding techniques and accompanying DNA editing methods that ensure data reliability, specificity and sensitivity of access, and at the same time provide exceptionally high data storage capacity. As a proof of concept, we encoded parts of the Wikipedia pages of six universities in the USA, and selected and edited parts of the text written in DNA corresponding to three of these schools. The results suggest that DNA is a versatile media suitable for both ultrahigh density archival and rewritable storage applications.
Pub.: 19 Sep '15, Pinned: 20 Apr '17
Abstract: Data-hiding in deoxyribonucleic acid (DNA) sequences can be used to develop an organic memory and to track parent genes in an offspring as well as in genetically modified organism. However, the main concerns regarding data-hiding in DNA sequences are the survival of organism and successful extraction of watermark from DNA. This implies that the organism should live and reproduce without any functional disorder even in the presence of the embedded data. Consequently, performing synonymous substitution in amino acids for watermarking becomes a primary option. In this regard, a hybrid watermark embedding strategy that employs synonymous substitution in both twofold and fourfold codons of amino acids is proposed. This work thus presents a high-capacity and mutation-resistant watermarking technique, DNA-LCEB, for hiding secret information in DNA of living organisms. By employing the different types of synonymous codons of amino acids, the data storage capacity has been significantly increased. It is further observed that the proposed DNA-LCEB employing a combination of synonymous substitution, lossless compression, encryption, and Bose-Chaudary-Hocquenghem coding is secure and performs better in terms of both capacity and robustness compared to existing DNA data-hiding schemes. The proposed DNA-LCEB is tested against different mutations, including silent, miss-sense, and non-sense mutations, and provides substantial improvement in terms of mutation detection/correction rate and bits per nucleotide. A web application for DNA-LCEB is available at http://188.8.131.52/DNA-LCEB.
Pub.: 10 Sep '14, Pinned: 20 Apr '17
Abstract: Digital production, transmission and storage have revolutionized how we access and use information but have also made archiving an increasingly complex task that requires active, continuing maintenance of digital media. This challenge has focused some interest on DNA as an attractive target for information storage because of its capacity for high-density information encoding, longevity under easily achieved conditions and proven track record as an information bearer. Previous DNA-based information storage approaches have encoded only trivial amounts of information or were not amenable to scaling-up, and used no robust error-correction and lacked examination of their cost-efficiency for large-scale information archival. Here we describe a scalable method that can reliably store more information than has been handled before. We encoded computer files totalling 739 kilobytes of hard-disk storage and with an estimated Shannon information of 5.2 × 10(6) bits into a DNA code, synthesized this DNA, sequenced it and reconstructed the original files with 100% accuracy. Theoretical analysis indicates that our DNA-based storage scheme could be scaled far beyond current global information volumes and offers a realistic technology for large-scale, long-term and infrequently accessed digital archiving. In fact, current trends in technological advances are reducing DNA synthesis costs at a pace that should make our scheme cost-effective for sub-50-year archiving within a decade.
Pub.: 29 Jan '13, Pinned: 21 Apr '17
Abstract: With world wide data predicted to exceed 40 trillion gigabytes by 2020, big data storage is a very real and escalating problem. Herein, we discuss the utility of synthetic DNA as a robust and eco-friendly archival data storage solution of the future.
Pub.: 22 Mar '13, Pinned: 20 Apr '17
Abstract: Digital information is accumulating at an astounding rate, straining our ability to store and archive it. DNA is among the most dense and stable information media known. The development of new technologies in both DNA synthesis and sequencing make DNA an increasingly feasible digital storage medium. We developed a strategy to encode arbitrary digital information in DNA, wrote a 5.27-megabit book using DNA microchips, and read the book by using next-generation DNA sequencing.
Pub.: 21 Aug '12, Pinned: 20 Apr '17
Abstract: The use of synthetic biological systems in research, healthcare, and manufacturing often requires autonomous history-dependent behavior and therefore some form of engineered biological memory. For example, the study or reprogramming of aging, cancer, or development would benefit from genetically encoded counters capable of recording up to several hundred cell division or differentiation events. Although genetic material itself provides a natural data storage medium, tools that allow researchers to reliably and reversibly write information to DNA in vivo are lacking. Here, we demonstrate a rewriteable recombinase addressable data (RAD) module that reliably stores digital information within a chromosome. RAD modules use serine integrase and excisionase functions adapted from bacteriophage to invert and restore specific DNA sequences. Our core RAD memory element is capable of passive information storage in the absence of heterologous gene expression for over 100 cell divisions and can be switched repeatedly without performance degradation, as is required to support combinatorial data storage. We also demonstrate how programmed stochasticity in RAD system performance arising from bidirectional recombination can be achieved and tuned by varying the synthesis and degradation rates of recombinase proteins. The serine recombinase functions used here do not require cell-specific cofactors and should be useful in extending computing and control methods to the study and engineering of many biological systems.
Pub.: 23 May '12, Pinned: 20 Apr '17