2,121 research outputs found

    Reconstruction Codes for DNA Sequences with Uniform Tandem-Duplication Errors

    Full text link
    DNA as a data storage medium has several advantages, including far greater data density compared to electronic media. We propose that schemes for data storage in the DNA of living organisms may benefit from studying the reconstruction problem, which is applicable whenever multiple reads of noisy data are available. This strategy is uniquely suited to the medium, which inherently replicates stored data in multiple distinct ways, caused by mutations. We consider noise introduced solely by uniform tandem-duplication, and utilize the relation to constant-weight integer codes in the Manhattan metric. By bounding the intersection of the cross-polytope with hyperplanes, we prove the existence of reconstruction codes with greater capacity than known error-correcting codes, which we can determine analytically for any set of parameters.Comment: 11 pages, 2 figures, Latex; version accepted for publicatio

    Information-theoretic interpretation of quantum error-correcting codes

    Get PDF
    Quantum error-correcting codes are analyzed from an information-theoretic perspective centered on quantum conditional and mutual entropies. This approach parallels the description of classical error correction in Shannon theory, while clarifying the differences between classical and quantum codes. More specifically, it is shown how quantum information theory accounts for the fact that "redundant" information can be distributed over quantum bits even though this does not violate the quantum "no-cloning" theorem. Such a remarkable feature, which has no counterpart for classical codes, is related to the property that the ternary mutual entropy vanishes for a tripartite system in a pure state. This information-theoretic description of quantum coding is used to derive the quantum analogue of the Singleton bound on the number of logical bits that can be preserved by a code of fixed length which can recover a given number of errors.Comment: 14 pages RevTeX, 8 Postscript figures. Added appendix. To appear in Phys. Rev.

    Low-redundancy codes for correcting multiple short-duplication and edit errors

    Full text link
    Due to its higher data density, longevity, energy efficiency, and ease of generating copies, DNA is considered a promising storage technology for satisfying future needs. However, a diverse set of errors including deletions, insertions, duplications, and substitutions may arise in DNA at different stages of data storage and retrieval. The current paper constructs error-correcting codes for simultaneously correcting short (tandem) duplications and at most pp edits, where a short duplication generates a copy of a substring with length 3\leq 3 and inserts the copy following the original substring, and an edit is a substitution, deletion, or insertion. Compared to the state-of-the-art codes for duplications only, the proposed codes correct up to pp edits (in addition to duplications) at the additional cost of roughly 8p(logqn)(1+o(1))8p(\log_q n)(1+o(1)) symbols of redundancy, thus achieving the same asymptotic rate, where q4q\ge 4 is the alphabet size and pp is a constant. Furthermore, the time complexities of both the encoding and decoding processes are polynomial when pp is a constant with respect to the code length.Comment: 21 pages. The paper has been submitted to IEEE Transaction on Information Theory. Furthermore, the paper was presented in part at the ISIT2021 and ISIT202

    Statistical Mechanics of Broadcast Channels Using Low Density Parity Check Codes

    Get PDF
    We investigate the use of Gallager's low-density parity-check (LDPC) codes in a broadcast channel, one of the fundamental models in network information theory. Combining linear codes is a standard technique in practical network communication schemes and is known to provide better performance than simple timesharing methods when algebraic codes are used. The statistical physics based analysis shows that the practical performance of the suggested method, achieved by employing the belief propagation algorithm, is superior to that of LDPC based timesharing codes while the best performance, when received transmissions are optimally decoded, is bounded by the timesharing limit.Comment: 14 pages, 4 figure

    Statistical Mechanics of Broadcast Channels Using Low Density Parity Check Codes

    Get PDF
    We investigate the use of Gallager's low-density parity-check (LDPC) codes in a broadcast channel, one of the fundamental models in network information theory. Combining linear codes is a standard technique in practical network communication schemes and is known to provide better performance than simple timesharing methods when algebraic codes are used. The statistical physics based analysis shows that the practical performance of the suggested method, achieved by employing the belief propagation algorithm, is superior to that of LDPC based timesharing codes while the best performance, when received transmissions are optimally decoded, is bounded by the timesharing limit.Comment: 14 pages, 4 figure

    Noise and Uncertainty in String-Duplication Systems

    Get PDF
    Duplication mutations play a critical role in the generation of biological sequences. Simultaneously, they have a deleterious effect on data stored using in-vivo DNA data storage. While duplications have been studied both as a sequence-generation mechanism and in the context of error correction, for simplicity these studies have not taken into account the presence of other types of mutations. In this work, we consider the capacity of duplication mutations in the presence of point-mutation noise, and so quantify the generation power of these mutations. We show that if the number of point mutations is vanishingly small compared to the number of duplication mutations of a constant length, the generation capacity of these mutations is zero. However, if the number of point mutations increases to a constant fraction of the number of duplications, then the capacity is nonzero. Lower and upper bounds for this capacity are also presented. Another problem that we study is concerned with the mismatch between code design and channel in data storage in the DNA of living organisms with respect to duplication mutations. In this context, we consider the uncertainty of such a mismatched coding scheme measured as the maximum number of input codewords that can lead to the same output
    corecore