2,789 research outputs found
On DNA Codes Over the Non-Chain Ring with
In this paper, we present a novel design strategy of DNA codes with length
over the non-chain ring
with elements and , where denotes the length of a code over
. We first study and analyze a distance conserving map defined over the ring
into the length- DNA sequences. Then, we derive some conditions on the
generator matrix of a linear code over , which leads to a DNA code with
reversible, reversible-complement, homopolymer -run-length, and
-GC-content constraints for integer ().
Finally, we propose a new construction of DNA codes using Reed-Muller type
generator matrices. This allows us to obtain DNA codes with reversible,
reversible-complement, homopolymer -run-length, and -GC-content
constraints.Comment: This paper has been presented in IEEE Information Theory Workshop
(ITW) 2022, Mumbai, INDI
Achievable Rates of Concatenated Codes in DNA Storage under Substitution Errors
In this paper, we study achievable rates of concatenated coding schemes over
a deoxyribonucleic acid (DNA) storage channel. Our channel model incorporates
the main features of DNA-based data storage. First, information is stored on
many, short DNA strands. Second, the strands are stored in an unordered fashion
inside the storage medium and each strand is replicated many times. Third, the
data is accessed in an uncontrollable manner, i.e., random strands are drawn
from the medium and received, possibly with errors. As one of our results, we
show that there is a significant gap between the channel capacity and the
achievable rate of a standard concatenated code in which one strand corresponds
to an inner block. This is in fact surprising as for other channels, such as
-ary symmetric channels, concatenated codes are known to achieve the
capacity. We further propose a modified concatenated coding scheme by combining
several strands into one inner block, which allows to narrow the gap and
achieve rates that are close to the capacity.Comment: Extended version of a paper submitted to International Symposium on
Information Theory and Its Applications (ISITA) 202
On Conflict Free DNA Codes
DNA storage has emerged as an important area of research. The reliability of
DNA storage system depends on designing the DNA strings (called DNA codes) that
are sufficiently dissimilar. In this work, we introduce DNA codes that satisfy
a special constraint. Each codeword of the DNA code has a specific property
that any two consecutive sub-strings of the DNA codeword will not be the same
(a generalization of homo-polymers constraint). This is in addition to the
usual constraints such as Hamming, reverse, reverse-complement and
-content. We believe that the new constraint will help further in reducing
the errors during reading and writing data into the synthetic DNA strings. We
also present a construction (based on a variant of stochastic local search
algorithm) to calculate the size of the DNA codes with all the above
constraints, which improves the lower bounds from the existing literature, for
some specific cases. Moreover, a recursive isometric map between binary vectors
and DNA strings is proposed. Using the map and the well known binary codes we
obtain few classes of DNA codes with all the constraints including the property
that the constructed DNA codewords are free from the hairpin-like secondary
structures.Comment: 12 pages, Draft (Table VI and Table VII are updated
In-Vitro Validated Methods for Encoding Digital Data in Deoxyribonucleic Acid (DNA)
Deoxyribonucleic acid (DNA) is emerging as an alternative archival memory technology. Recent advancements in DNA synthesis and sequencing have both increased the capacity and decreased the cost of storing information in de novo synthesized DNA pools. In this survey, we review methods for translating digital data to and/or from DNA molecules. An emphasis is placed on methods which have been validated by storing and retrieving real-world data via in-vitro experiments
Protecting the Future of Information: LOCO Coding With Error Detection for DNA Data Storage
DNA strands serve as a storage medium for -ary data over the alphabet
. DNA data storage promises formidable information density,
long-term durability, and ease of replicability. However, information in this
intriguing storage technology might be corrupted. Experiments have revealed
that DNA sequences with long homopolymers and/or with low -content are
notably more subject to errors upon storage.
This paper investigates the utilization of the recently-introduced method for
designing lexicographically-ordered constrained (LOCO) codes in DNA data
storage. This paper introduces DNA LOCO (D-LOCO) codes, over the alphabet
with limited runs of identical symbols. These codes come with an
encoding-decoding rule we derive, which provides affordable encoding-decoding
algorithms. In terms of storage overhead, the proposed encoding-decoding
algorithms outperform those in the existing literature. Our algorithms are
readily reconfigurable. D-LOCO codes are intrinsically balanced, which allows
us to achieve balancing over the entire DNA strand with minimal rate penalty.
Moreover, we propose four schemes to bridge consecutive codewords, three of
which guarantee single substitution error detection per codeword. We examine
the probability of undetecting errors. We also show that D-LOCO codes are
capacity-achieving and that they offer remarkably high rates at moderate
lengths.Comment: 14 pages (double column), 3 figures, submitted to the IEEE
Transactions on Molecular, Biological and Multi-scale Communications (TMBMC
- …