127,126 research outputs found

    Capacity of DNA Data Embedding Under Substitution Mutations

    Full text link
    A number of methods have been proposed over the last decade for encoding information using deoxyribonucleic acid (DNA), giving rise to the emerging area of DNA data embedding. Since a DNA sequence is conceptually equivalent to a sequence of quaternary symbols (bases), DNA data embedding (diversely called DNA watermarking or DNA steganography) can be seen as a digital communications problem where channel errors are tantamount to mutations of DNA bases. Depending on the use of coding or noncoding DNA hosts, which, respectively, denote DNA segments that can or cannot be translated into proteins, DNA data embedding is essentially a problem of communications with or without side information at the encoder. In this paper the Shannon capacity of DNA data embedding is obtained for the case in which DNA sequences are subject to substitution mutations modelled using the Kimura model from molecular evolution studies. Inferences are also drawn with respect to the biological implications of some of the results presented.Comment: 22 pages, 13 figures; preliminary versions of this work were presented at the SPIE Media Forensics and Security XII conference (January 2010) and at the IEEE ICASSP conference (March 2010

    A Mathematical Formulation of DNA Computation

    Get PDF
    DNA computation is to use DNA molecules for information storing and processing. The task is accomplished by encoding and interpreting DNA molecules in suspended solutions before and after the complementary binding reactions. DNA computation is attractive, due to its fast parallel information processing, remarkable energy efficiency, and high storing capacity. Challenges currently faced by DNA computation are (1) lack of theoretical computational models for applications, and (2) high error rate for implementation. This paper attempts to address these problems from mathematical modeling and genetic coding aspects. The first part of this paper presents a mathematical formulation of DNA computation. The model may serve as a theoretical framework for DNA computation. In the second part, a genetic code based DNA computation approach is presented to reduce error rate for implementation, which has been a major concern for DNA computation. The method provides a promising alternative to reduce error rate for DNA computation

    Protecting the Future of Information: LOCO Coding With Error Detection for DNA Data Storage

    Full text link
    DNA strands serve as a storage medium for 44-ary data over the alphabet {A,T,G,C}\{A,T,G,C\}. DNA data storage promises formidable information density, long-term durability, and ease of replicability. However, information in this intriguing storage technology might be corrupted. Experiments have revealed that DNA sequences with long homopolymers and/or with low GCGC-content are notably more subject to errors upon storage. This paper investigates the utilization of the recently-introduced method for designing lexicographically-ordered constrained (LOCO) codes in DNA data storage. This paper introduces DNA LOCO (D-LOCO) codes, over the alphabet {A,T,G,C}\{A,T,G,C\} with limited runs of identical symbols. These codes come with an encoding-decoding rule we derive, which provides affordable encoding-decoding algorithms. In terms of storage overhead, the proposed encoding-decoding algorithms outperform those in the existing literature. Our algorithms are readily reconfigurable. D-LOCO codes are intrinsically balanced, which allows us to achieve balancing over the entire DNA strand with minimal rate penalty. Moreover, we propose four schemes to bridge consecutive codewords, three of which guarantee single substitution error detection per codeword. We examine the probability of undetecting errors. We also show that D-LOCO codes are capacity-achieving and that they offer remarkably high rates at moderate lengths.Comment: 14 pages (double column), 3 figures, submitted to the IEEE Transactions on Molecular, Biological and Multi-scale Communications (TMBMC

    On Optimal Family of Codes for Archival DNA Storage

    Full text link
    DNA based storage systems received attention by many researchers. This includes archival and re-writable random access DNA based storage systems. In this work, we have developed an efficient technique to encode the data into DNA sequence by using non-linear families of ternary codes. In particular, we proposes an algorithm to encode data into DNA with high information storage density and better error correction using a sub code of Golay code. Theoretically, 115 exabytes (EB) data can be stored in one gram of DNA by our method.Comment: Supplementary file and the software DNA Cloud 2.0 is available at http://www.guptalab.org/dnacloud This is the preliminary version of the paper that appeared in Proceedings of IWSDA 2015, pp. 143--14

    Coding for Optimized Writing Rate in DNA Storage

    Get PDF
    A method for encoding information in DNA sequences is described. The method is based on the precisionresolution framework, and is aimed to work in conjunction with a recently suggested terminator-free template independent DNA synthesis method. The suggested method optimizes the amount of information bits per synthesis time unit, namely, the writing rate. Additionally, the encoding scheme studied here takes into account the existence of multiple copies of the DNA sequence, which are independently distorted. Finally, quantizers for various run-length distributions are designed

    Molecular access to multi-dimensionally encoded information

    Get PDF
    Polymer scientist have only recently realized that information storage on the molecular level is not only restricted to DNA-based systems. Similar encoding and decoding of data have been demonstrated on synthetic polymers that could overcome some of the drawbacks associated with DNA, such as the ability to make use of a larger monomer alphabet. This feature article describes some of the recent data storage strategies that were investigated, ranging from writing information on linear sequence-defined macromolecules up to layer-by-layer casted surfaces and QR codes. In addition, some strategies to increase storage density are elaborated and some trends regarding future perspectives on molecular data storage from the literature are critically evaluated. This work ends with highlighting the demand for new strategies setting up reliable solutions for future data management technologies

    DNA multi-bit non-volatile memory and bit-shifting operations using addressable electrode arrays and electric field-induced hybridization.

    Get PDF
    DNA has been employed to either store digital information or to perform parallel molecular computing. Relatively unexplored is the ability to combine DNA-based memory and logical operations in a single platform. Here, we show a DNA tri-level cell non-volatile memory system capable of parallel random-access writing of memory and bit shifting operations. A microchip with an array of individually addressable electrodes was employed to enable random access of the memory cells using electric fields. Three segments on a DNA template molecule were used to encode three data bits. Rapid writing of data bits was enabled by electric field-induced hybridization of fluorescently labeled complementary probes and the data bits were read by fluorescence imaging. We demonstrated the rapid parallel writing and reading of 8 (23) combinations of 3-bit memory data and bit shifting operations by electric field-induced strand displacement. Our system may find potential applications in DNA-based memory and computations

    Coding for Optimized Writing Rate in DNA Storage

    Get PDF
    A method for encoding information in DNA sequences is described. The method is based on the precision-resolution framework, and is aimed to work in conjunction with a recently suggested terminator-free template independent DNA synthesis method. The suggested method optimizes the amount of information bits per synthesis time unit, namely, the writing rate. Additionally, the encoding scheme studied here takes into account the existence of multiple copies of the DNA sequence, which are independently distorted. Finally, quantizers for various run-length distributions are designed.Comment: To appear in ISIT 202
    corecore