7,720 research outputs found
Asymmetric Lee Distance Codes for DNA-Based Storage
We consider a new family of codes, termed asymmetric Lee distance codes, that
arise in the design and implementation of DNA-based storage systems and systems
with parallel string transmission protocols. The codewords are defined over a
quaternary alphabet, although the results carry over to other alphabet sizes;
furthermore, symbol confusability is dictated by their underlying binary
representation. Our contributions are two-fold. First, we demonstrate that the
new distance represents a linear combination of the Lee and Hamming distance
and derive upper bounds on the size of the codes under this metric based on
linear programming techniques. Second, we propose a number of code
constructions which imply lower bounds
Reconstruction Codes for DNA Sequences with Uniform Tandem-Duplication Errors
DNA as a data storage medium has several advantages, including far greater
data density compared to electronic media. We propose that schemes for data
storage in the DNA of living organisms may benefit from studying the
reconstruction problem, which is applicable whenever multiple reads of noisy
data are available. This strategy is uniquely suited to the medium, which
inherently replicates stored data in multiple distinct ways, caused by
mutations. We consider noise introduced solely by uniform tandem-duplication,
and utilize the relation to constant-weight integer codes in the Manhattan
metric. By bounding the intersection of the cross-polytope with hyperplanes, we
prove the existence of reconstruction codes with greater capacity than known
error-correcting codes, which we can determine analytically for any set of
parameters.Comment: 11 pages, 2 figures, Latex; version accepted for publicatio
Mutually Uncorrelated Primers for DNA-Based Data Storage
We introduce the notion of weakly mutually uncorrelated (WMU) sequences,
motivated by applications in DNA-based data storage systems and for
synchronization of communication devices. WMU sequences are characterized by
the property that no sufficiently long suffix of one sequence is the prefix of
the same or another sequence. WMU sequences used for primer design in DNA-based
data storage systems are also required to be at large mutual Hamming distance
from each other, have balanced compositions of symbols, and avoid primer-dimer
byproducts. We derive bounds on the size of WMU and various constrained WMU
codes and present a number of constructions for balanced, error-correcting,
primer-dimer free WMU codes using Dyck paths, prefix-synchronized and cyclic
codes.Comment: 14 pages, 3 figures, 1 Table. arXiv admin note: text overlap with
arXiv:1601.0817
On Coding over Sliced Information
The interest in channel models in which the data is sent as an unordered set
of binary strings has increased lately, due to emerging applications in DNA
storage, among others. In this paper we analyze the minimal redundancy of
binary codes for this channel under substitution errors, and provide several
constructions, some of which are shown to be asymptotically optimal up to
constants. The surprising result in this paper is that while the information
vector is sliced into a set of unordered strings, the amount of redundant bits
that are required to correct errors is order-wise equivalent to the amount
required in the classical error correcting paradigm
Codes for Correcting Asymmetric Adjacent Transpositions and Deletions
Codes in the Damerau--Levenshtein metric have been extensively studied
recently owing to their applications in DNA-based data storage. In particular,
Gabrys, Yaakobi, and Milenkovic (2017) designed a length- code correcting a
single deletion and adjacent transpositions with at most
bits of redundancy. In this work, we consider a new setting where both
asymmetric adjacent transpositions (also known as right-shifts or left-shifts)
and deletions may occur. We present several constructions of the codes
correcting these errors in various cases. In particular, we design a code
correcting a single deletion, right-shift, and left-shift errors
with at most bits of redundancy where . In
addition, we investigate codes correcting -deletions, right-shift,
and left-shift errors with both uniquely-decoding and list-decoding
algorithms. Our main contribution here is the construction of a list-decodable
code with list size and with at most bits of redundancy, where . Finally, we construct
both non-systematic and systematic codes for correcting blocks of -deletions
with -limited-magnitude and adjacent transpositions
Rates of DNA Sequence Profiles for Practical Values of Read Lengths
A recent study by one of the authors has demonstrated the importance of
profile vectors in DNA-based data storage. We provide exact values and lower
bounds on the number of profile vectors for finite values of alphabet size ,
read length , and word length .Consequently, we demonstrate that for
and , the number of profile vectors is at least
with very close to one.In addition to enumeration
results, we provide a set of efficient encoding and decoding algorithms for
each of two particular families of profile vectors
- β¦