2,121 research outputs found
Reconstruction Codes for DNA Sequences with Uniform Tandem-Duplication Errors
DNA as a data storage medium has several advantages, including far greater
data density compared to electronic media. We propose that schemes for data
storage in the DNA of living organisms may benefit from studying the
reconstruction problem, which is applicable whenever multiple reads of noisy
data are available. This strategy is uniquely suited to the medium, which
inherently replicates stored data in multiple distinct ways, caused by
mutations. We consider noise introduced solely by uniform tandem-duplication,
and utilize the relation to constant-weight integer codes in the Manhattan
metric. By bounding the intersection of the cross-polytope with hyperplanes, we
prove the existence of reconstruction codes with greater capacity than known
error-correcting codes, which we can determine analytically for any set of
parameters.Comment: 11 pages, 2 figures, Latex; version accepted for publicatio
Information-theoretic interpretation of quantum error-correcting codes
Quantum error-correcting codes are analyzed from an information-theoretic
perspective centered on quantum conditional and mutual entropies. This approach
parallels the description of classical error correction in Shannon theory,
while clarifying the differences between classical and quantum codes. More
specifically, it is shown how quantum information theory accounts for the fact
that "redundant" information can be distributed over quantum bits even though
this does not violate the quantum "no-cloning" theorem. Such a remarkable
feature, which has no counterpart for classical codes, is related to the
property that the ternary mutual entropy vanishes for a tripartite system in a
pure state. This information-theoretic description of quantum coding is used to
derive the quantum analogue of the Singleton bound on the number of logical
bits that can be preserved by a code of fixed length which can recover a given
number of errors.Comment: 14 pages RevTeX, 8 Postscript figures. Added appendix. To appear in
Phys. Rev.
Low-redundancy codes for correcting multiple short-duplication and edit errors
Due to its higher data density, longevity, energy efficiency, and ease of
generating copies, DNA is considered a promising storage technology for
satisfying future needs. However, a diverse set of errors including deletions,
insertions, duplications, and substitutions may arise in DNA at different
stages of data storage and retrieval. The current paper constructs
error-correcting codes for simultaneously correcting short (tandem)
duplications and at most edits, where a short duplication generates a copy
of a substring with length and inserts the copy following the original
substring, and an edit is a substitution, deletion, or insertion. Compared to
the state-of-the-art codes for duplications only, the proposed codes correct up
to edits (in addition to duplications) at the additional cost of roughly
symbols of redundancy, thus achieving the same
asymptotic rate, where is the alphabet size and is a constant.
Furthermore, the time complexities of both the encoding and decoding processes
are polynomial when is a constant with respect to the code length.Comment: 21 pages. The paper has been submitted to IEEE Transaction on
Information Theory. Furthermore, the paper was presented in part at the
ISIT2021 and ISIT202
Statistical Mechanics of Broadcast Channels Using Low Density Parity Check Codes
We investigate the use of Gallager's low-density parity-check (LDPC) codes in
a broadcast channel, one of the fundamental models in network information
theory. Combining linear codes is a standard technique in practical network
communication schemes and is known to provide better performance than simple
timesharing methods when algebraic codes are used. The statistical physics
based analysis shows that the practical performance of the suggested method,
achieved by employing the belief propagation algorithm, is superior to that of
LDPC based timesharing codes while the best performance, when received
transmissions are optimally decoded, is bounded by the timesharing limit.Comment: 14 pages, 4 figure
Statistical Mechanics of Broadcast Channels Using Low Density Parity Check Codes
We investigate the use of Gallager's low-density parity-check (LDPC) codes in
a broadcast channel, one of the fundamental models in network information
theory. Combining linear codes is a standard technique in practical network
communication schemes and is known to provide better performance than simple
timesharing methods when algebraic codes are used. The statistical physics
based analysis shows that the practical performance of the suggested method,
achieved by employing the belief propagation algorithm, is superior to that of
LDPC based timesharing codes while the best performance, when received
transmissions are optimally decoded, is bounded by the timesharing limit.Comment: 14 pages, 4 figure
Noise and Uncertainty in String-Duplication Systems
Duplication mutations play a critical role in the
generation of biological sequences. Simultaneously, they
have a deleterious effect on data stored using in-vivo DNA data storage. While duplications have been studied both as a sequence-generation mechanism and in the context of error correction, for simplicity these studies have not taken into account the presence of other types of mutations. In this work, we consider the capacity of duplication mutations in the presence of point-mutation
noise, and so quantify the generation power of these mutations. We show that if the number of point mutations is vanishingly small compared to the number of duplication mutations of a constant length, the generation capacity of these mutations is zero. However, if the number of point mutations increases to a constant fraction of the number of duplications, then the capacity is nonzero. Lower and upper bounds for this capacity are also presented. Another problem that we study is concerned with the
mismatch between code design and channel in data storage in the DNA of living organisms with respect to duplication mutations. In this context, we consider the uncertainty of such a mismatched coding scheme measured as the maximum number of input codewords that can lead to the same output
- …