34,192 research outputs found

    On Optimal Family of Codes for Archival DNA Storage

    Full text link
    DNA based storage systems received attention by many researchers. This includes archival and re-writable random access DNA based storage systems. In this work, we have developed an efficient technique to encode the data into DNA sequence by using non-linear families of ternary codes. In particular, we proposes an algorithm to encode data into DNA with high information storage density and better error correction using a sub code of Golay code. Theoretically, 115 exabytes (EB) data can be stored in one gram of DNA by our method.Comment: Supplementary file and the software DNA Cloud 2.0 is available at http://www.guptalab.org/dnacloud This is the preliminary version of the paper that appeared in Proceedings of IWSDA 2015, pp. 143--14

    Mutually Uncorrelated Primers for DNA-Based Data Storage

    Full text link
    We introduce the notion of weakly mutually uncorrelated (WMU) sequences, motivated by applications in DNA-based data storage systems and for synchronization of communication devices. WMU sequences are characterized by the property that no sufficiently long suffix of one sequence is the prefix of the same or another sequence. WMU sequences used for primer design in DNA-based data storage systems are also required to be at large mutual Hamming distance from each other, have balanced compositions of symbols, and avoid primer-dimer byproducts. We derive bounds on the size of WMU and various constrained WMU codes and present a number of constructions for balanced, error-correcting, primer-dimer free WMU codes using Dyck paths, prefix-synchronized and cyclic codes.Comment: 14 pages, 3 figures, 1 Table. arXiv admin note: text overlap with arXiv:1601.0817

    Reconstruction Codes for DNA Sequences with Uniform Tandem-Duplication Errors

    Full text link
    DNA as a data storage medium has several advantages, including far greater data density compared to electronic media. We propose that schemes for data storage in the DNA of living organisms may benefit from studying the reconstruction problem, which is applicable whenever multiple reads of noisy data are available. This strategy is uniquely suited to the medium, which inherently replicates stored data in multiple distinct ways, caused by mutations. We consider noise introduced solely by uniform tandem-duplication, and utilize the relation to constant-weight integer codes in the Manhattan metric. By bounding the intersection of the cross-polytope with hyperplanes, we prove the existence of reconstruction codes with greater capacity than known error-correcting codes, which we can determine analytically for any set of parameters.Comment: 11 pages, 2 figures, Latex; version accepted for publicatio

    On Coding over Sliced Information

    Get PDF
    The interest in channel models in which the data is sent as an unordered set of binary strings has increased lately, due to emerging applications in DNA storage, among others. In this paper we analyze the minimal redundancy of binary codes for this channel under substitution errors, and provide several constructions, some of which are shown to be asymptotically optimal up to constants. The surprising result in this paper is that while the information vector is sliced into a set of unordered strings, the amount of redundant bits that are required to correct errors is order-wise equivalent to the amount required in the classical error correcting paradigm
    • …
    corecore