2,789 research outputs found

    On DNA Codes Over the Non-Chain Ring Z4+uZ4+u2Z4\mathbb{Z}_4+u\mathbb{Z}_4+u^2\mathbb{Z}_4 with u3=1u^3=1

    Full text link
    In this paper, we present a novel design strategy of DNA codes with length 3n3n over the non-chain ring R=Z4+uZ4+u2Z4R=\mathbb{Z}_4+u\mathbb{Z}_4+u^2\mathbb{Z}_4 with 6464 elements and u3=1u^3=1, where nn denotes the length of a code over RR. We first study and analyze a distance conserving map defined over the ring RR into the length-33 DNA sequences. Then, we derive some conditions on the generator matrix of a linear code over RR, which leads to a DNA code with reversible, reversible-complement, homopolymer 22-run-length, and w3n\frac{w}{3n}-GC-content constraints for integer ww (0≤w≤3n0\leq w\leq 3n). Finally, we propose a new construction of DNA codes using Reed-Muller type generator matrices. This allows us to obtain DNA codes with reversible, reversible-complement, homopolymer 22-run-length, and 23\frac{2}{3}-GC-content constraints.Comment: This paper has been presented in IEEE Information Theory Workshop (ITW) 2022, Mumbai, INDI

    Achievable Rates of Concatenated Codes in DNA Storage under Substitution Errors

    Full text link
    In this paper, we study achievable rates of concatenated coding schemes over a deoxyribonucleic acid (DNA) storage channel. Our channel model incorporates the main features of DNA-based data storage. First, information is stored on many, short DNA strands. Second, the strands are stored in an unordered fashion inside the storage medium and each strand is replicated many times. Third, the data is accessed in an uncontrollable manner, i.e., random strands are drawn from the medium and received, possibly with errors. As one of our results, we show that there is a significant gap between the channel capacity and the achievable rate of a standard concatenated code in which one strand corresponds to an inner block. This is in fact surprising as for other channels, such as qq-ary symmetric channels, concatenated codes are known to achieve the capacity. We further propose a modified concatenated coding scheme by combining several strands into one inner block, which allows to narrow the gap and achieve rates that are close to the capacity.Comment: Extended version of a paper submitted to International Symposium on Information Theory and Its Applications (ISITA) 202

    On Conflict Free DNA Codes

    Full text link
    DNA storage has emerged as an important area of research. The reliability of DNA storage system depends on designing the DNA strings (called DNA codes) that are sufficiently dissimilar. In this work, we introduce DNA codes that satisfy a special constraint. Each codeword of the DNA code has a specific property that any two consecutive sub-strings of the DNA codeword will not be the same (a generalization of homo-polymers constraint). This is in addition to the usual constraints such as Hamming, reverse, reverse-complement and GCGC-content. We believe that the new constraint will help further in reducing the errors during reading and writing data into the synthetic DNA strings. We also present a construction (based on a variant of stochastic local search algorithm) to calculate the size of the DNA codes with all the above constraints, which improves the lower bounds from the existing literature, for some specific cases. Moreover, a recursive isometric map between binary vectors and DNA strings is proposed. Using the map and the well known binary codes we obtain few classes of DNA codes with all the constraints including the property that the constructed DNA codewords are free from the hairpin-like secondary structures.Comment: 12 pages, Draft (Table VI and Table VII are updated

    In-Vitro Validated Methods for Encoding Digital Data in Deoxyribonucleic Acid (DNA)

    Get PDF
    Deoxyribonucleic acid (DNA) is emerging as an alternative archival memory technology. Recent advancements in DNA synthesis and sequencing have both increased the capacity and decreased the cost of storing information in de novo synthesized DNA pools. In this survey, we review methods for translating digital data to and/or from DNA molecules. An emphasis is placed on methods which have been validated by storing and retrieving real-world data via in-vitro experiments

    Protecting the Future of Information: LOCO Coding With Error Detection for DNA Data Storage

    Full text link
    DNA strands serve as a storage medium for 44-ary data over the alphabet {A,T,G,C}\{A,T,G,C\}. DNA data storage promises formidable information density, long-term durability, and ease of replicability. However, information in this intriguing storage technology might be corrupted. Experiments have revealed that DNA sequences with long homopolymers and/or with low GCGC-content are notably more subject to errors upon storage. This paper investigates the utilization of the recently-introduced method for designing lexicographically-ordered constrained (LOCO) codes in DNA data storage. This paper introduces DNA LOCO (D-LOCO) codes, over the alphabet {A,T,G,C}\{A,T,G,C\} with limited runs of identical symbols. These codes come with an encoding-decoding rule we derive, which provides affordable encoding-decoding algorithms. In terms of storage overhead, the proposed encoding-decoding algorithms outperform those in the existing literature. Our algorithms are readily reconfigurable. D-LOCO codes are intrinsically balanced, which allows us to achieve balancing over the entire DNA strand with minimal rate penalty. Moreover, we propose four schemes to bridge consecutive codewords, three of which guarantee single substitution error detection per codeword. We examine the probability of undetecting errors. We also show that D-LOCO codes are capacity-achieving and that they offer remarkably high rates at moderate lengths.Comment: 14 pages (double column), 3 figures, submitted to the IEEE Transactions on Molecular, Biological and Multi-scale Communications (TMBMC
    • …
    corecore