16 research outputs found

    Coding over Sets for DNA Storage

    Full text link
    In this paper, we study error-correcting codes for the storage of data in synthetic deoxyribonucleic acid (DNA). We investigate a storage model where data is represented by an unordered set of MM sequences, each of length LL. Errors within that model are losses of whole sequences and point errors inside the sequences, such as substitutions, insertions and deletions. We propose code constructions which can correct these errors with efficient encoders and decoders. By deriving upper bounds on the cardinalities of these codes using sphere packing arguments, we show that many of our codes are close to optimal.Comment: 5 page

    Optimal Error-Detecting Codes for General Asymmetric Channels via Sperner Theory

    Full text link
    Several communication models that are of relevance in practice are asymmetric in the way they act on the transmitted "objects". Examples include channels in which the amplitudes of the transmitted pulses can only be decreased, channels in which the symbols can only be deleted, channels in which non-zero symbols can only be shifted to the right (e.g., timing channels), subspace channels in which the dimension of the transmitted vector space can only be reduced, unordered storage channels in which the cardinality of the stored (multi)set can only be reduced, etc. We introduce a formal definition of an asymmetric channel as a channel whose action induces a partial order on the set of all possible inputs, and show that this definition captures all the above examples. Such a general approach allows one to treat all these different models in a unified way, and to obtain a characterization of optimal error-detecting codes for many interesting asymmetric channels by using Sperner theory.Comment: To be presented at the IEEE Information Theory Workshop (ITW), Mumbai, India, Nov. 202

    Anchor-Based Correction of Substitutions in Indexed Sets

    Full text link
    Motivated by DNA-based data storage, we investigate a system where digital information is stored in an unordered set of several vectors over a finite alphabet. Each vector begins with a unique index that represents its position in the whole data set and does not contain data. This paper deals with the design of error-correcting codes for such indexed sets in the presence of substitution errors. We propose a construction that efficiently deals with the challenges that arise when designing codes for unordered sets. Using a novel mechanism, called anchoring, we show that it is possible to combat the ordering loss of sequences with only a small amount of redundancy, which allows to use standard coding techniques, such as tensor-product codes to correct errors within the sequences. We finally derive upper and lower bounds on the achievable redundancy of codes within the considered channel model and verify that our construction yields a redundancy that is close to the best possible achievable one. Our results surprisingly indicate that it requires less redundancy to correct errors in the indices than in the data part of vectors.Comment: 5 page

    Reconstruction Codes for DNA Sequences with Uniform Tandem-Duplication Errors

    Full text link
    DNA as a data storage medium has several advantages, including far greater data density compared to electronic media. We propose that schemes for data storage in the DNA of living organisms may benefit from studying the reconstruction problem, which is applicable whenever multiple reads of noisy data are available. This strategy is uniquely suited to the medium, which inherently replicates stored data in multiple distinct ways, caused by mutations. We consider noise introduced solely by uniform tandem-duplication, and utilize the relation to constant-weight integer codes in the Manhattan metric. By bounding the intersection of the cross-polytope with hyperplanes, we prove the existence of reconstruction codes with greater capacity than known error-correcting codes, which we can determine analytically for any set of parameters.Comment: 11 pages, 2 figures, Latex; version accepted for publicatio

    On Coding over Sliced Information

    Get PDF
    The interest in channel models in which the data is sent as an unordered set of binary strings has increased lately, due to emerging applications in DNA storage, among others. In this paper we analyze the minimal redundancy of binary codes for this channel under substitution errors, and provide several constructions, some of which are shown to be asymptotically optimal up to constants. The surprising result in this paper is that while the information vector is sliced into a set of unordered strings, the amount of redundant bits that are required to correct errors is order-wise equivalent to the amount required in the classical error correcting paradigm
    corecore