8 research outputs found
Rank-Modulation Codes for DNA Storage With Shotgun Sequencing
Synthesis of DNA molecules offers unprecedented advances in storage technology. Yet, the microscopic world in which these molecules reside induces error patterns that are fundamentally different from their digital counterparts. Hence, to maintain reliability in reading and writing, new coding schemes must be developed. In a reading technique called shotgun sequencing, a long DNA string is read in a sliding window fashion, and a profile vector is produced. It was recently suggested by Kiah et al. that such a vector can represent the permutation which is induced by its entries, and hence a rank-modulation scheme arises. Although this interpretation suggests high error tolerance, it is unclear which permutations are feasible and how to produce a DNA string whose profile vector induces a given permutation. In this paper, by observing some necessary conditions, an upper bound for the number of feasible permutations is given. Furthermore, a technique for deciding the feasibility of a permutation is devised. By using insights from this technique, an algorithm for producing a considerable number of feasible permutations is given, which applies to any alphabet size and any window length
Rank-Modulation Codes for DNA Storage With Shotgun Sequencing
Synthesis of DNA molecules offers unprecedented advances in storage technology. Yet, the microscopic world in which these molecules reside induces error patterns that are fundamentally different from their digital counterparts. Hence, to maintain reliability in reading and writing, new coding schemes must be developed. In a reading technique called shotgun sequencing, a long DNA string is read in a sliding window fashion, and a profile vector is produced. It was recently suggested by Kiah et al. that such a vector can represent the permutation which is induced by its entries, and hence a rank-modulation scheme arises. Although this interpretation suggests high error tolerance, it is unclear which permutations are feasible and how to produce a DNA string whose profile vector induces a given permutation. In this paper, by observing some necessary conditions, an upper bound for the number of feasible permutations is given. Furthermore, a technique for deciding the feasibility of a permutation is devised. By using insights from this technique, an algorithm for producing a considerable number of feasible permutations is given, which applies to any alphabet size and any window length
Reconstruction Codes for DNA Sequences with Uniform Tandem-Duplication Errors
DNA as a data storage medium has several advantages, including far greater
data density compared to electronic media. We propose that schemes for data
storage in the DNA of living organisms may benefit from studying the
reconstruction problem, which is applicable whenever multiple reads of noisy
data are available. This strategy is uniquely suited to the medium, which
inherently replicates stored data in multiple distinct ways, caused by
mutations. We consider noise introduced solely by uniform tandem-duplication,
and utilize the relation to constant-weight integer codes in the Manhattan
metric. By bounding the intersection of the cross-polytope with hyperplanes, we
prove the existence of reconstruction codes with greater capacity than known
error-correcting codes, which we can determine analytically for any set of
parameters.Comment: 11 pages, 2 figures, Latex; version accepted for publicatio
On Coding over Sliced Information
The interest in channel models in which the data is sent as an unordered set
of binary strings has increased lately, due to emerging applications in DNA
storage, among others. In this paper we analyze the minimal redundancy of
binary codes for this channel under substitution errors, and provide several
constructions, some of which are shown to be asymptotically optimal up to
constants. The surprising result in this paper is that while the information
vector is sliced into a set of unordered strings, the amount of redundant bits
that are required to correct errors is order-wise equivalent to the amount
required in the classical error correcting paradigm
On Codes for the Noisy Substring Channel
We consider the problem of coding for the substring channel, in which
information strings are observed only through their (multisets of) substrings.
Because of applications to DNA-based data storage, due to DNA sequencing
techniques, interest in this channel has renewed in recent years. In contrast
to existing literature, we consider a noisy channel model, where information is
subject to noise \emph{before} its substrings are sampled, motivated by in-vivo
storage.
We study two separate noise models, substitutions or deletions. In both
cases, we examine families of codes which may be utilized for error-correction
and present combinatorial bounds. Through a generalization of the concept of
repeat-free strings, we show that the added required redundancy due to this
imperfect observation assumption is sublinear, either when the fraction of
errors in the observed substring length is sufficiently small, or when that
length is sufficiently long. This suggests that no asymptotic cost in rate is
incurred by this channel model in these cases.Comment: ISIT 2021 version (including all proofs
Robust Indexing for the Sliced Channel: Almost Optimal Codes for Substitutions and Deletions
Encoding data as a set of unordered strings is receiving great attention as
it captures one of the basic features of DNA storage systems. However, the
challenge of constructing optimal redundancy codes for this channel remained
elusive. In this paper, we address this problem and present an order-wise
optimal construction of codes that are capable of correcting multiple
substitution, deletion, and insertion errors for this channel model. The key
ingredient in the code construction is a technique we call robust indexing:
simultaneously assigning indices to unordered strings (hence, creating order)
and also embedding information in these indices.
The encoded indices are resilient to substitution, deletion, and insertion
errors, and therefore, so is the entire code