Search CORE

7 research outputs found

Anchor-Based Correction of Substitutions in Indexed Sets

Author: Lenz Andreas
Siegel Paul H.
Wachter-Zeh Antonia
Yaakobi Eitan
Publication venue
Publication date: 21/01/2019
Field of study

Motivated by DNA-based data storage, we investigate a system where digital information is stored in an unordered set of several vectors over a finite alphabet. Each vector begins with a unique index that represents its position in the whole data set and does not contain data. This paper deals with the design of error-correcting codes for such indexed sets in the presence of substitution errors. We propose a construction that efficiently deals with the challenges that arise when designing codes for unordered sets. Using a novel mechanism, called anchoring, we show that it is possible to combat the ordering loss of sequences with only a small amount of redundancy, which allows to use standard coding techniques, such as tensor-product codes to correct errors within the sequences. We finally derive upper and lower bounds on the achievable redundancy of codes within the considered channel model and verify that our construction yields a redundancy that is close to the best possible achievable one. Our results surprisingly indicate that it requires less redundancy to correct errors in the indices than in the data part of vectors.Comment: 5 page

arXiv.org e-Print Archive

Crossref

On Codes for the Noisy Substring Channel

Author: Polyanskii Nikita
Yehezkeally Yonatan
Publication venue
Publication date: 07/06/2021
Field of study

We consider the problem of coding for the substring channel, in which information strings are observed only through their (multisets of) substrings. Because of applications to DNA-based data storage, due to DNA sequencing techniques, interest in this channel has renewed in recent years. In contrast to existing literature, we consider a noisy channel model, where information is subject to noise \emph{before} its substrings are sampled, motivated by in-vivo storage. We study two separate noise models, substitutions or deletions. In both cases, we examine families of codes which may be utilized for error-correction and present combinatorial bounds. Through a generalization of the concept of repeat-free strings, we show that the added required redundancy due to this imperfect observation assumption is sublinear, either when the fraction of errors in the observed substring length is sufficiently small, or when that length is sufficiently long. This suggests that no asymptotic cost in rate is incurred by this channel model in these cases.Comment: ISIT 2021 version (including all proofs

arXiv.org e-Print Archive

Robust Indexing for the Sliced Channel: Almost Optimal Codes for Substitutions and Deletions

Author: Bruck Jehoshua
Raviv Netanel
Sima Jin
Publication venue
Publication date: 15/08/2023
Field of study

Encoding data as a set of unordered strings is receiving great attention as it captures one of the basic features of DNA storage systems. However, the challenge of constructing optimal redundancy codes for this channel remained elusive. In this paper, we address this problem and present an order-wise optimal construction of codes that are capable of correcting multiple substitution, deletion, and insertion errors for this channel model. The key ingredient in the code construction is a technique we call robust indexing: simultaneously assigning indices to unordered strings (hence, creating order) and also embedding information in these indices. The encoded indices are resilient to substitution, deletion, and insertion errors, and therefore, so is the entire code

arXiv.org e-Print Archive

Tail-Erasure-Correcting Codes

Author: Gabrys Ryan
Moav Boaz
Yaakobi Eitan
Publication venue
Publication date: 06/02/2024
Field of study

The increasing demand for data storage has prompted the exploration of new techniques, with molecular data storage being a promising alternative. In this work, we develop coding schemes for a new storage paradigm that can be represented as a collection of two-dimensional arrays. Motivated by error patterns observed in recent prototype architectures, our study focuses on correcting erasures in the last few symbols of each row, and also correcting arbitrary deletions across rows. We present code constructions and explicit encoders and decoders that are shown to be nearly optimal in many scenarios. We show that the new coding schemes are capable of effectively mitigating these errors, making these emerging storage platforms potentially promising solutions

arXiv.org e-Print Archive