Search CORE

6 research outputs found

Rates of DNA Sequence Profiles for Practical Values of Read Lengths

Author: Chang Zuling
Chrisnata Johan
Ezerman Martianus Frederic
Kiah Han Mao
Publication venue
Publication date: 08/07/2016
Field of study

q

, read length

\ell

, and word length

n

.Consequently, we demonstrate that for

q\ge 2

and

n\le q^{\ell/2-1}

, the number of profile vectors is at least

q^{\kappa n}

with

\kappa

very close to one.In addition to enumeration results, we provide a set of efficient encoding and decoding algorithms for each of two particular families of profile vectors

arXiv.org e-Print Archive

DR-NTU (Digital Repository of NTU)

On Codes for the Noisy Substring Channel

Author: Polyanskii Nikita
Yehezkeally Yonatan
Publication venue
Publication date: 07/06/2021
Field of study

We consider the problem of coding for the substring channel, in which information strings are observed only through their (multisets of) substrings. Because of applications to DNA-based data storage, due to DNA sequencing techniques, interest in this channel has renewed in recent years. In contrast to existing literature, we consider a noisy channel model, where information is subject to noise \emph{before} its substrings are sampled, motivated by in-vivo storage. We study two separate noise models, substitutions or deletions. In both cases, we examine families of codes which may be utilized for error-correction and present combinatorial bounds. Through a generalization of the concept of repeat-free strings, we show that the added required redundancy due to this imperfect observation assumption is sublinear, either when the fraction of errors in the observed substring length is sufficiently small, or when that length is sufficiently long. This suggests that no asymptotic cost in rate is incurred by this channel model in these cases.Comment: ISIT 2021 version (including all proofs

arXiv.org e-Print Archive

Repeat-Free Codes

Author: Elishco Ohad
Gabrys Ryan
Médard Muriel
Yaakobi Eitan
Publication venue
Publication date: 12/09/2019
Field of study

In this paper we consider the problem of encoding data into repeat-free sequences in which sequences are imposed to contain any

k

-tuple at most once (for predefined

k

). First, the capacity and redundancy of the repeat-free constraint are calculated. Then, an efficient algorithm, which uses a single bit of redundancy, is presented to encode length-

n

sequences for

k=2+2\log (n)

. This algorithm is then improved to support any value of

k

of the form

k=a\log (n)

, for

1<a

, while its redundancy is

o(n)

. We also calculate the capacity of repeat-free sequences when combined with local constraints which are given by a constrained system, and the capacity of multi-dimensional repeat-free codes.Comment: 18 page

arXiv.org e-Print Archive

DSpace@MIT

Rates of DNA sequence profiles for practical values of read lengths

Author: Chang Zuling
Chrisnata Johan
Ezerman Martianus Frederic
Kiah Han Mao
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2017
Field of study

A recent study by one of the authors has demonstrated the importance of profile vectors in DNA-based data storage. We provide exact values and lower bounds on the number of profile vectors for finite values of alphabet size q, read length 1, and word length n. Consequently, we demonstrate that for q ≥ 2 and n ≤ q 1/2-1 , the number of profile vectors is at least q κn with κ very close to 1. In addition to enumeration results, we provide a set of efficient encoding and decoding algorithms for certain families of profile vectors.Accepted versio