9 research outputs found
On Codes for the Noisy Substring Channel
We consider the problem of coding for the substring channel, in which
information strings are observed only through their (multisets of) substrings.
Because of applications to DNA-based data storage, due to DNA sequencing
techniques, interest in this channel has renewed in recent years. In contrast
to existing literature, we consider a noisy channel model, where information is
subject to noise \emph{before} its substrings are sampled, motivated by in-vivo
storage.
We study two separate noise models, substitutions or deletions. In both
cases, we examine families of codes which may be utilized for error-correction
and present combinatorial bounds. Through a generalization of the concept of
repeat-free strings, we show that the added required redundancy due to this
imperfect observation assumption is sublinear, either when the fraction of
errors in the observed substring length is sufficiently small, or when that
length is sufficiently long. This suggests that no asymptotic cost in rate is
incurred by this channel model in these cases.Comment: ISIT 2021 version (including all proofs
Snake-in-the-Box Codes for Rank Modulation
Motivated by the rank-modulation scheme with applications to flash memory, we
consider Gray codes capable of detecting a single error, also known as
snake-in-the-box codes. We study two error metrics: Kendall's -metric,
which applies to charge-constrained errors, and the -metric, which
is useful in the case of limited magnitude errors. In both cases we construct
snake-in-the-box codes with rate asymptotically tending to 1. We also provide
efficient successor-calculation functions, as well as ranking and unranking
functions. Finally, we also study bounds on the parameters of such codes
Generalized Unique Reconstruction from Substrings
This paper introduces a new family of reconstruction codes which is motivated
by applications in DNA data storage and sequencing. In such applications, DNA
strands are sequenced by reading some subset of their substrings. While
previous works considered two extreme cases in which all substrings of
pre-defined lengths are read or substrings are read with no overlap for the
single string case, this work studies two extensions of this paradigm. The
first extension considers the setup in which consecutive substrings are read
with some given minimum overlap. First, an upper bound is provided on the
attainable rates of codes that guarantee unique reconstruction. Then, efficient
constructions of codes that asymptotically meet that upper bound are presented.
In the second extension, we study the setup where multiple strings are
reconstructed together. Given the number of strings and their length, we first
derive a lower bound on the read substrings' length that is necessary
for the existence of multi-strand reconstruction codes with non-vanishing
rates. We then present two constructions of such codes and show that their
rates approach 1 for values of that asymptotically behave like the lower
bound.Comment: arXiv admin note: text overlap with arXiv:2205.0393
Adversarial Torn-paper Codes
We study the adversarial torn-paper channel. This problem is motivated by
applications in DNA data storage where the DNA strands that carry information
may break into smaller pieces which are received out of order. Our model
extends the previously researched probabilistic setting to the worst-case. We
develop code constructions for any parameters of the channel for which
non-vanishing asymptotic rate is possible and show our constructions achieve
asymptotically optimal rate while allowing for efficient encoding and decoding.
Finally, we extend our results to related settings included multi-strand
storage, presence of substitution errors, or incomplete coverage.Comment: Journal submissio