15,656 research outputs found
On the Coding Capacity of Reverse-Complement and Palindromic Duplication-Correcting Codes
We derive the coding capacity for duplication-correcting codes capable of
correcting any number of duplications. We do so both for reverse-complement
duplications, as well as palindromic (reverse) duplications. We show that
except for duplication-length , the coding capacity is . When the
duplication length is , the coding capacity depends on the alphabet size,
and we construct optimal codes
Reconstruction Codes for DNA Sequences with Uniform Tandem-Duplication Errors
DNA as a data storage medium has several advantages, including far greater
data density compared to electronic media. We propose that schemes for data
storage in the DNA of living organisms may benefit from studying the
reconstruction problem, which is applicable whenever multiple reads of noisy
data are available. This strategy is uniquely suited to the medium, which
inherently replicates stored data in multiple distinct ways, caused by
mutations. We consider noise introduced solely by uniform tandem-duplication,
and utilize the relation to constant-weight integer codes in the Manhattan
metric. By bounding the intersection of the cross-polytope with hyperplanes, we
prove the existence of reconstruction codes with greater capacity than known
error-correcting codes, which we can determine analytically for any set of
parameters.Comment: 11 pages, 2 figures, Latex; version accepted for publicatio
Integer codes correcting sparse byte errors
In public optical networks, the data are scrambled with a xu + 1 self-synchronous scramblers (SSSs). The reason for this is to avoid long strings of ones or zeros, which might affect the receiver synchronization. Unfortunately, the use of SSSs is always related to the problem of duplication of channel errors. More precisely, each error occurring during the transmission will be duplicated u bits later. In this paper, we present a low-cost solution to this problem based on integer codes capable of correcting sparse byte errors.Radonjic, A., Vujicic, V., 2019. Integer codes correcting sparse byte errors. Cryptogr. Commun. 11, 1069–1077. [https://doi.org/10.1007/s12095-019-0350-9
Duplication-Correcting Codes for Data Storage in the DNA of Living Organisms
The ability to store data in the DNA of a living
organism has applications in a variety of areas including synthetic biology and watermarking of patented genetically-modified organisms. Data stored in this medium is subject to errors arising from various mutations, such as point mutations, indels, and tandem duplication, which need to be corrected to maintain data integrity. In this paper, we provide error-correcting codes for errors caused by tandem duplications, which create a copy of a block of the sequence and insert it in a tandem manner, i.e., next
to the original. In particular, we present two families of codes for correcting errors due to tandem-duplications of a fixed length; the first family can correct any number of errors while the second corrects a bounded number of errors. We also study codes for correcting tandem duplications of length up to a given constant
k, where we are primarily focused on the cases of k = 2, 3
Low-redundancy codes for correcting multiple short-duplication and edit errors
Due to its higher data density, longevity, energy efficiency, and ease of
generating copies, DNA is considered a promising storage technology for
satisfying future needs. However, a diverse set of errors including deletions,
insertions, duplications, and substitutions may arise in DNA at different
stages of data storage and retrieval. The current paper constructs
error-correcting codes for simultaneously correcting short (tandem)
duplications and at most edits, where a short duplication generates a copy
of a substring with length and inserts the copy following the original
substring, and an edit is a substitution, deletion, or insertion. Compared to
the state-of-the-art codes for duplications only, the proposed codes correct up
to edits (in addition to duplications) at the additional cost of roughly
symbols of redundancy, thus achieving the same
asymptotic rate, where is the alphabet size and is a constant.
Furthermore, the time complexities of both the encoding and decoding processes
are polynomial when is a constant with respect to the code length.Comment: 21 pages. The paper has been submitted to IEEE Transaction on
Information Theory. Furthermore, the paper was presented in part at the
ISIT2021 and ISIT202
- …