Search CORE

7,720 research outputs found

Asymmetric Lee Distance Codes for DNA-Based Storage

Author: Gabrys Ryan
Kiah Han Mao
Milenkovic Olgica
Publication venue
Publication date: 14/12/2016
Field of study

We consider a new family of codes, termed asymmetric Lee distance codes, that arise in the design and implementation of DNA-based storage systems and systems with parallel string transmission protocols. The codewords are defined over a quaternary alphabet, although the results carry over to other alphabet sizes; furthermore, symbol confusability is dictated by their underlying binary representation. Our contributions are two-fold. First, we demonstrate that the new distance represents a linear combination of the Lee and Hamming distance and derive upper bounds on the size of the codes under this metric based on linear programming techniques. Second, we propose a number of code constructions which imply lower bounds

arXiv.org e-Print Archive

CiteSeerX

Reconstruction Codes for DNA Sequences with Uniform Tandem-Duplication Errors

Author: Schwartz Moshe
Yehezkeally Yonatan
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 05/09/2019
Field of study

DNA as a data storage medium has several advantages, including far greater data density compared to electronic media. We propose that schemes for data storage in the DNA of living organisms may benefit from studying the reconstruction problem, which is applicable whenever multiple reads of noisy data are available. This strategy is uniquely suited to the medium, which inherently replicates stored data in multiple distinct ways, caused by mutations. We consider noise introduced solely by uniform tandem-duplication, and utilize the relation to constant-weight integer codes in the Manhattan metric. By bounding the intersection of the cross-polytope with hyperplanes, we prove the existence of reconstruction codes with greater capacity than known error-correcting codes, which we can determine analytically for any set of parameters.Comment: 11 pages, 2 figures, Latex; version accepted for publicatio

arXiv.org e-Print Archive

Crossref

Mutually Uncorrelated Primers for DNA-Based Data Storage

Author: Gabrys Ryan
Kiah Han Mao
Milenkovic Olgica
Yazdi S. M. Hossein Tabatabaei
Publication venue
Publication date: 13/09/2017
Field of study

We introduce the notion of weakly mutually uncorrelated (WMU) sequences, motivated by applications in DNA-based data storage systems and for synchronization of communication devices. WMU sequences are characterized by the property that no sufficiently long suffix of one sequence is the prefix of the same or another sequence. WMU sequences used for primer design in DNA-based data storage systems are also required to be at large mutual Hamming distance from each other, have balanced compositions of symbols, and avoid primer-dimer byproducts. We derive bounds on the size of WMU and various constrained WMU codes and present a number of constructions for balanced, error-correcting, primer-dimer free WMU codes using Dyck paths, prefix-synchronized and cyclic codes.Comment: 14 pages, 3 figures, 1 Table. arXiv admin note: text overlap with arXiv:1601.0817

arXiv.org e-Print Archive

DR-NTU (Digital Repository of NTU)

On Coding over Sliced Information

Author: Bruck Jehoshua
Raviv Netanel
Sima Jin
Publication venue
Publication date: 30/09/2018
Field of study

The interest in channel models in which the data is sent as an unordered set of binary strings has increased lately, due to emerging applications in DNA storage, among others. In this paper we analyze the minimal redundancy of binary codes for this channel under substitution errors, and provide several constructions, some of which are shown to be asymptotically optimal up to constants. The surprising result in this paper is that while the information vector is sliced into a set of unordered strings, the amount of redundant bits that are required to correct errors is order-wise equivalent to the amount required in the classical error correcting paradigm

arXiv.org e-Print Archive

Crossref

Caltech Authors

Codes for Correcting Asymmetric Adjacent Transpositions and Deletions

Author: Tan Vincent Y. F.
Vu Van Khu
Wang Shuche
Publication venue
Publication date: 29/06/2023
Field of study

Codes in the Damerau--Levenshtein metric have been extensively studied recently owing to their applications in DNA-based data storage. In particular, Gabrys, Yaakobi, and Milenkovic (2017) designed a length-

n

code correcting a single deletion and

s

adjacent transpositions with at most

(1+2s)\log n

bits of redundancy. In this work, we consider a new setting where both asymmetric adjacent transpositions (also known as right-shifts or left-shifts) and deletions may occur. We present several constructions of the codes correcting these errors in various cases. In particular, we design a code correcting a single deletion,

s^+

right-shift, and

s^-

left-shift errors with at most

(1+s)\log (n+s+1)+1

bits of redundancy where

s=s^{+}+s^{-}

. In addition, we investigate codes correcting

t

0

-deletions,

s^+

right-shift, and

s^-

left-shift errors with both uniquely-decoding and list-decoding algorithms. Our main contribution here is the construction of a list-decodable code with list size

O(n^{\min\{s+1,t\}})

and with at most

(\max \{t,s+1\}) \log n+O(1)

bits of redundancy, where

s=s^{+}+s^{-}

. Finally, we construct both non-systematic and systematic codes for correcting blocks of

0

-deletions with

\ell

-limited-magnitude and

s

adjacent transpositions

arXiv.org e-Print Archive

Rates of DNA Sequence Profiles for Practical Values of Read Lengths

Author: Chang Zuling
Chrisnata Johan
Ezerman Martianus Frederic
Kiah Han Mao
Publication venue
Publication date: 08/07/2016
Field of study

A recent study by one of the authors has demonstrated the importance of profile vectors in DNA-based data storage. We provide exact values and lower bounds on the number of profile vectors for finite values of alphabet size

q

, read length

\ell

, and word length

n

.Consequently, we demonstrate that for

q\ge 2

and

n\le q^{\ell/2-1}

, the number of profile vectors is at least

q^{\kappa n}

with

\kappa

very close to one.In addition to enumeration results, we provide a set of efficient encoding and decoding algorithms for each of two particular families of profile vectors

arXiv.org e-Print Archive

DR-NTU (Digital Repository of NTU)