I am a scientist specialized on human genetics and cell metabolism.


Current data storage tech can't keep up with all the data we generate today, so how about using DNA?

In 10 seconds? In today’s information Era, smartphones, social media and big data are generating more information than ever before, raising the question of how to store this data durably and efficiently. Surprisingly, DNA may be the answer.

Don’t believe it? According to Google’s CEO, Eric Schmidt, every two days we create as much data as we did from the dawn of civilization up until 2003, and world wide data is predicted to exceed 40 trillion gigabytes by 2020. Thus, how to handle all this data is an increasingly serious problem. But several research groups are trying to solve this problem by developing methods to efficiently store information in DNA.

But why DNA, you ask? Firstly, because of durability. While memory cards and chips are only sustainable for 5 years, and magnetic tapes for 10 to 30 years, DNA can be kept intact for thousands of years, under the right conditions. Indeed, more than 80% of the mammoth’s genome remains readable despite the fact that this species disappeared from the planet more than 10,000 years ago.

Secondly, because of density. Instead of using a binary code, so two possibilities (1 or 0), it has four possibilities per position (A, G, C and T). Also, it’s 3D structure offers higher memory space than the linear structure used in current data storing media. Additionally, epigenetic modifications allow multiple layers of information to be stored in a single DNA template. In fact, the latest method created, DNA Fountain, allows a density of 215 Petabyte/gram of DNA with perfect retrieval of the information.

And finally, because of secrecy. What better way to hide information than encoding it into the DNA of living organisms?


DNA-LCEB: a high-capacity and mutation-resistant DNA data-hiding approach by employing encryption, error correcting codes, and hybrid twofold and fourfold codon-based strategy for synonymous substitution in amino acids.

Abstract: Data-hiding in deoxyribonucleic acid (DNA) sequences can be used to develop an organic memory and to track parent genes in an offspring as well as in genetically modified organism. However, the main concerns regarding data-hiding in DNA sequences are the survival of organism and successful extraction of watermark from DNA. This implies that the organism should live and reproduce without any functional disorder even in the presence of the embedded data. Consequently, performing synonymous substitution in amino acids for watermarking becomes a primary option. In this regard, a hybrid watermark embedding strategy that employs synonymous substitution in both twofold and fourfold codons of amino acids is proposed. This work thus presents a high-capacity and mutation-resistant watermarking technique, DNA-LCEB, for hiding secret information in DNA of living organisms. By employing the different types of synonymous codons of amino acids, the data storage capacity has been significantly increased. It is further observed that the proposed DNA-LCEB employing a combination of synonymous substitution, lossless compression, encryption, and Bose-Chaudary-Hocquenghem coding is secure and performs better in terms of both capacity and robustness compared to existing DNA data-hiding schemes. The proposed DNA-LCEB is tested against different mutations, including silent, miss-sense, and non-sense mutations, and provides substantial improvement in terms of mutation detection/correction rate and bits per nucleotide. A web application for DNA-LCEB is available at

Pub.: 10 Sep '14, Pinned: 20 Apr '17

Towards practical, high-capacity, low-maintenance information storage in synthesized DNA.

Abstract: Digital production, transmission and storage have revolutionized how we access and use information but have also made archiving an increasingly complex task that requires active, continuing maintenance of digital media. This challenge has focused some interest on DNA as an attractive target for information storage because of its capacity for high-density information encoding, longevity under easily achieved conditions and proven track record as an information bearer. Previous DNA-based information storage approaches have encoded only trivial amounts of information or were not amenable to scaling-up, and used no robust error-correction and lacked examination of their cost-efficiency for large-scale information archival. Here we describe a scalable method that can reliably store more information than has been handled before. We encoded computer files totalling 739 kilobytes of hard-disk storage and with an estimated Shannon information of 5.2 × 10(6) bits into a DNA code, synthesized this DNA, sequenced it and reconstructed the original files with 100% accuracy. Theoretical analysis indicates that our DNA-based storage scheme could be scaled far beyond current global information volumes and offers a realistic technology for large-scale, long-term and infrequently accessed digital archiving. In fact, current trends in technological advances are reducing DNA synthesis costs at a pace that should make our scheme cost-effective for sub-50-year archiving within a decade.

Pub.: 29 Jan '13, Pinned: 21 Apr '17

Rewritable digital data storage in live cells via engineered control of recombination directionality.

Abstract: The use of synthetic biological systems in research, healthcare, and manufacturing often requires autonomous history-dependent behavior and therefore some form of engineered biological memory. For example, the study or reprogramming of aging, cancer, or development would benefit from genetically encoded counters capable of recording up to several hundred cell division or differentiation events. Although genetic material itself provides a natural data storage medium, tools that allow researchers to reliably and reversibly write information to DNA in vivo are lacking. Here, we demonstrate a rewriteable recombinase addressable data (RAD) module that reliably stores digital information within a chromosome. RAD modules use serine integrase and excisionase functions adapted from bacteriophage to invert and restore specific DNA sequences. Our core RAD memory element is capable of passive information storage in the absence of heterologous gene expression for over 100 cell divisions and can be switched repeatedly without performance degradation, as is required to support combinatorial data storage. We also demonstrate how programmed stochasticity in RAD system performance arising from bidirectional recombination can be achieved and tuned by varying the synthesis and degradation rates of recombinase proteins. The serine recombinase functions used here do not require cell-specific cofactors and should be useful in extending computing and control methods to the study and engineering of many biological systems.

Pub.: 23 May '12, Pinned: 20 Apr '17