12 research outputs found

    Reconstruction Codes for DNA Sequences with Uniform Tandem-Duplication Errors

    Full text link
    DNA as a data storage medium has several advantages, including far greater data density compared to electronic media. We propose that schemes for data storage in the DNA of living organisms may benefit from studying the reconstruction problem, which is applicable whenever multiple reads of noisy data are available. This strategy is uniquely suited to the medium, which inherently replicates stored data in multiple distinct ways, caused by mutations. We consider noise introduced solely by uniform tandem-duplication, and utilize the relation to constant-weight integer codes in the Manhattan metric. By bounding the intersection of the cross-polytope with hyperplanes, we prove the existence of reconstruction codes with greater capacity than known error-correcting codes, which we can determine analytically for any set of parameters.Comment: 11 pages, 2 figures, Latex; version accepted for publicatio

    On Conflict Free DNA Codes

    Full text link
    DNA storage has emerged as an important area of research. The reliability of DNA storage system depends on designing the DNA strings (called DNA codes) that are sufficiently dissimilar. In this work, we introduce DNA codes that satisfy a special constraint. Each codeword of the DNA code has a specific property that any two consecutive sub-strings of the DNA codeword will not be the same (a generalization of homo-polymers constraint). This is in addition to the usual constraints such as Hamming, reverse, reverse-complement and GCGC-content. We believe that the new constraint will help further in reducing the errors during reading and writing data into the synthetic DNA strings. We also present a construction (based on a variant of stochastic local search algorithm) to calculate the size of the DNA codes with all the above constraints, which improves the lower bounds from the existing literature, for some specific cases. Moreover, a recursive isometric map between binary vectors and DNA strings is proposed. Using the map and the well known binary codes we obtain few classes of DNA codes with all the constraints including the property that the constructed DNA codewords are free from the hairpin-like secondary structures.Comment: 12 pages, Draft (Table VI and Table VII are updated

    Codes for Correcting Asymmetric Adjacent Transpositions and Deletions

    Full text link
    Codes in the Damerau--Levenshtein metric have been extensively studied recently owing to their applications in DNA-based data storage. In particular, Gabrys, Yaakobi, and Milenkovic (2017) designed a length-nn code correcting a single deletion and ss adjacent transpositions with at most (1+2s)logn(1+2s)\log n bits of redundancy. In this work, we consider a new setting where both asymmetric adjacent transpositions (also known as right-shifts or left-shifts) and deletions may occur. We present several constructions of the codes correcting these errors in various cases. In particular, we design a code correcting a single deletion, s+s^+ right-shift, and ss^- left-shift errors with at most (1+s)log(n+s+1)+1(1+s)\log (n+s+1)+1 bits of redundancy where s=s++ss=s^{+}+s^{-}. In addition, we investigate codes correcting tt 00-deletions, s+s^+ right-shift, and ss^- left-shift errors with both uniquely-decoding and list-decoding algorithms. Our main contribution here is the construction of a list-decodable code with list size O(nmin{s+1,t})O(n^{\min\{s+1,t\}}) and with at most (max{t,s+1})logn+O(1)(\max \{t,s+1\}) \log n+O(1) bits of redundancy, where s=s++ss=s^{+}+s^{-}. Finally, we construct both non-systematic and systematic codes for correcting blocks of 00-deletions with \ell-limited-magnitude and ss adjacent transpositions
    corecore