Search CORE

12,709 research outputs found

On Codes for the Noisy Substring Channel

Author: Polyanskii Nikita
Yehezkeally Yonatan
Publication venue
Publication date: 07/06/2021
Field of study

We consider the problem of coding for the substring channel, in which information strings are observed only through their (multisets of) substrings. Because of applications to DNA-based data storage, due to DNA sequencing techniques, interest in this channel has renewed in recent years. In contrast to existing literature, we consider a noisy channel model, where information is subject to noise \emph{before} its substrings are sampled, motivated by in-vivo storage. We study two separate noise models, substitutions or deletions. In both cases, we examine families of codes which may be utilized for error-correction and present combinatorial bounds. Through a generalization of the concept of repeat-free strings, we show that the added required redundancy due to this imperfect observation assumption is sublinear, either when the fraction of errors in the observed substring length is sufficiently small, or when that length is sufficiently long. This suggests that no asymptotic cost in rate is incurred by this channel model in these cases.Comment: ISIT 2021 version (including all proofs

arXiv.org e-Print Archive

On Coding over Sliced Information

Author: Bruck Jehoshua
Raviv Netanel
Sima Jin
Publication venue
Publication date: 30/09/2018
Field of study

The interest in channel models in which the data is sent as an unordered set of binary strings has increased lately, due to emerging applications in DNA storage, among others. In this paper we analyze the minimal redundancy of binary codes for this channel under substitution errors, and provide several constructions, some of which are shown to be asymptotically optimal up to constants. The surprising result in this paper is that while the information vector is sliced into a set of unordered strings, the amount of redundant bits that are required to correct errors is order-wise equivalent to the amount required in the classical error correcting paradigm

arXiv.org e-Print Archive

Crossref

Caltech Authors

Reconstruction Codes for DNA Sequences with Uniform Tandem-Duplication Errors

Author: Schwartz Moshe
Yehezkeally Yonatan
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 05/09/2019
Field of study

DNA as a data storage medium has several advantages, including far greater data density compared to electronic media. We propose that schemes for data storage in the DNA of living organisms may benefit from studying the reconstruction problem, which is applicable whenever multiple reads of noisy data are available. This strategy is uniquely suited to the medium, which inherently replicates stored data in multiple distinct ways, caused by mutations. We consider noise introduced solely by uniform tandem-duplication, and utilize the relation to constant-weight integer codes in the Manhattan metric. By bounding the intersection of the cross-polytope with hyperplanes, we prove the existence of reconstruction codes with greater capacity than known error-correcting codes, which we can determine analytically for any set of parameters.Comment: 11 pages, 2 figures, Latex; version accepted for publicatio

arXiv.org e-Print Archive

Crossref

Rates of DNA Sequence Profiles for Practical Values of Read Lengths

Author: Chang Zuling
Chrisnata Johan
Ezerman Martianus Frederic
Kiah Han Mao
Publication venue
Publication date: 08/07/2016
Field of study

A recent study by one of the authors has demonstrated the importance of profile vectors in DNA-based data storage. We provide exact values and lower bounds on the number of profile vectors for finite values of alphabet size

q

, read length

\ell

, and word length

n

.Consequently, we demonstrate that for

q\ge 2

and

n\le q^{\ell/2-1}

, the number of profile vectors is at least

q^{\kappa n}

with

\kappa

very close to one.In addition to enumeration results, we provide a set of efficient encoding and decoding algorithms for each of two particular families of profile vectors

arXiv.org e-Print Archive

DR-NTU (Digital Repository of NTU)