Search CORE

12 research outputs found

Reconstruction Codes for DNA Sequences with Uniform Tandem-Duplication Errors

Author: Schwartz Moshe
Yehezkeally Yonatan
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 05/09/2019
Field of study

DNA as a data storage medium has several advantages, including far greater data density compared to electronic media. We propose that schemes for data storage in the DNA of living organisms may benefit from studying the reconstruction problem, which is applicable whenever multiple reads of noisy data are available. This strategy is uniquely suited to the medium, which inherently replicates stored data in multiple distinct ways, caused by mutations. We consider noise introduced solely by uniform tandem-duplication, and utilize the relation to constant-weight integer codes in the Manhattan metric. By bounding the intersection of the cross-polytope with hyperplanes, we prove the existence of reconstruction codes with greater capacity than known error-correcting codes, which we can determine analytically for any set of parameters.Comment: 11 pages, 2 figures, Latex; version accepted for publicatio

arXiv.org e-Print Archive

Crossref

Recommended from our members

Systematic Codes for Correcting Deletion/Insertion of One Zero in Each and Every Bucket of Zeros

Author: Vu Hoang
Publication venue: 'Oregon State University'
Publication date
Field of study

In this thesis, we propose a systematic code for correcting t = 1 insertion/deletion errors of the character ”0” that can occur between any two consecutive 1’s in a binary string. The code requires balanced input strings, where each word of length n contains ⌈n/2⌉ 0’s and ⌊n/2⌋ 1’s. This error model is shown to be related to zero-error capacity-achieving codes for a limited-magnitude error channel. We prove that the inputs can be partitioned in to different subsets and the words in the same subset can be assigned a unique check for this error model. We deduce that the upper bound for the number of checks required is 2w, where w is the weight of the input. Efficient encoding and decoding algorithms are provided. Our algorithms return variable-length checks and may require up to r = 3w check bits. While the optimal rate for this error model is not known, we establish our rate to be between 0.4 and 0.666 and demonstrate potential avenues for improvement

ScholarsArchive@OSU

On Conflict Free DNA Codes

Author: Benerjee Krishna Gopal
Deb Sourav
Gupta Manish K
Publication venue
Publication date: 08/07/2019
Field of study

DNA storage has emerged as an important area of research. The reliability of DNA storage system depends on designing the DNA strings (called DNA codes) that are sufficiently dissimilar. In this work, we introduce DNA codes that satisfy a special constraint. Each codeword of the DNA code has a specific property that any two consecutive sub-strings of the DNA codeword will not be the same (a generalization of homo-polymers constraint). This is in addition to the usual constraints such as Hamming, reverse, reverse-complement and

GC

-content. We believe that the new constraint will help further in reducing the errors during reading and writing data into the synthetic DNA strings. We also present a construction (based on a variant of stochastic local search algorithm) to calculate the size of the DNA codes with all the above constraints, which improves the lower bounds from the existing literature, for some specific cases. Moreover, a recursive isometric map between binary vectors and DNA strings is proposed. Using the map and the well known binary codes we obtain few classes of DNA codes with all the constraints including the property that the constructed DNA codewords are free from the hairpin-like secondary structures.Comment: 12 pages, Draft (Table VI and Table VII are updated

arXiv.org e-Print Archive

Codes for Correcting Asymmetric Adjacent Transpositions and Deletions

Author: Tan Vincent Y. F.
Vu Van Khu
Wang Shuche
Publication venue
Publication date: 29/06/2023
Field of study

Codes in the Damerau--Levenshtein metric have been extensively studied recently owing to their applications in DNA-based data storage. In particular, Gabrys, Yaakobi, and Milenkovic (2017) designed a length-

n

code correcting a single deletion and

s

adjacent transpositions with at most

(1+2s)\log n

bits of redundancy. In this work, we consider a new setting where both asymmetric adjacent transpositions (also known as right-shifts or left-shifts) and deletions may occur. We present several constructions of the codes correcting these errors in various cases. In particular, we design a code correcting a single deletion,

s^+

right-shift, and

s^-

left-shift errors with at most

(1+s)\log (n+s+1)+1

bits of redundancy where

s=s^{+}+s^{-}

. In addition, we investigate codes correcting

t

0

-deletions,

s^+

right-shift, and

s^-

left-shift errors with both uniquely-decoding and list-decoding algorithms. Our main contribution here is the construction of a list-decodable code with list size

O(n^{\min\{s+1,t\}})

and with at most

(\max \{t,s+1\}) \log n+O(1)

bits of redundancy, where

s=s^{+}+s^{-}

. Finally, we construct both non-systematic and systematic codes for correcting blocks of

0

-deletions with

\ell

-limited-magnitude and

s

adjacent transpositions

arXiv.org e-Print Archive