Search CORE

146 research outputs found

Beyond Single-Deletion Correcting Codes: Substitutions and Transpositions

Author: Gabrys Ryan
Guruswami Venkatesan
Wu Ke
Publication venue: LIPIcs - Leibniz International Proceedings in Informatics. Approximation, Randomization, and Combinatorial Optimization. Algorithms and Techniques (APPROX/RANDOM 2022)
Publication date: 01/01/2022
Field of study

We consider the problem of designing low-redundancy codes in settings where one must correct deletions in conjunction with substitutions or adjacent transpositions; a combination of errors that is usually observed in DNA-based data storage. One of the most basic versions of this problem was settled more than 50 years ago by Levenshtein, who proved that binary Varshamov-Tenengolts codes correct one arbitrary edit error, i.e., one deletion or one substitution, with nearly optimal redundancy. However, this approach fails to extend to many simple and natural variations of the binary single-edit error setting. In this work, we make progress on the code design problem above in three such variations: - We construct linear-time encodable and decodable length-n non-binary codes correcting a single edit error with nearly optimal redundancy log n+O(log log n), providing an alternative simpler proof of a result by Cai, Chee, Gabrys, Kiah, and Nguyen (IEEE Trans. Inf. Theory 2021). This is achieved by employing what we call weighted VT sketches, a new notion that may be of independent interest. - We show the existence of a binary code correcting one deletion or one adjacent transposition with nearly optimal redundancy log n+O(log log n). - We construct linear-time encodable and list-decodable binary codes with list-size 2 for one deletion and one substitution with redundancy 4log n+O(log log n). This matches the existential bound up to an O(log log n) additive term

Dagstuhl Research Online Publication Server

Codes for Correcting Asymmetric Adjacent Transpositions and Deletions

Author: Tan Vincent Y. F.
Vu Van Khu
Wang Shuche
Publication venue
Publication date: 29/06/2023
Field of study

Codes in the Damerau--Levenshtein metric have been extensively studied recently owing to their applications in DNA-based data storage. In particular, Gabrys, Yaakobi, and Milenkovic (2017) designed a length-

n

code correcting a single deletion and

s

adjacent transpositions with at most

(1+2s)\log n

bits of redundancy. In this work, we consider a new setting where both asymmetric adjacent transpositions (also known as right-shifts or left-shifts) and deletions may occur. We present several constructions of the codes correcting these errors in various cases. In particular, we design a code correcting a single deletion,

s^+

right-shift, and

s^-

left-shift errors with at most

(1+s)\log (n+s+1)+1

bits of redundancy where

s=s^{+}+s^{-}

. In addition, we investigate codes correcting

t

0

-deletions,

s^+

right-shift, and

s^-

left-shift errors with both uniquely-decoding and list-decoding algorithms. Our main contribution here is the construction of a list-decodable code with list size

O(n^{\min\{s+1,t\}})

and with at most

(\max \{t,s+1\}) \log n+O(1)

bits of redundancy, where

s=s^{+}+s^{-}

. Finally, we construct both non-systematic and systematic codes for correcting blocks of

0

-deletions with

\ell

-limited-magnitude and

s

adjacent transpositions

arXiv.org e-Print Archive

t-Deletion-s-Insertion-Burst Correcting Codes

Author: Lu Ziyang
Zhang Yiwei
Publication venue
Publication date: 22/11/2022
Field of study

Motivated by applications in DNA-based storage and communication systems, we study deletion and insertion errors simultaneously in a burst. In particular, we study a type of error named

t

-deletion-

s

-insertion-burst (

(t,s)

-burst for short) which is a generalization of the

(2,1)

-burst error proposed by Schoeny {\it et. al}. Such an error deletes

t

consecutive symbols and inserts an arbitrary sequence of length

s

at the same coordinate. We provide a sphere-packing upper bound on the size of binary codes that can correct a

(t,s)

-burst error, showing that the redundancy of such codes is at least

\log n+t-1

. For

t\geq 2s

, an explicit construction of binary

(t,s)

-burst correcting codes with redundancy

\log n+(t-s-1)\log\log n+O(1)

is given. In particular, we construct a binary

(3,1)

-burst correcting code with redundancy at most

\log n+9

, which is optimal up to a constant.Comment: Part of this work (the (t,1)-burst model) was presented at ISIT2022. This full version has been submitted to IEEE-IT in August 202

arXiv.org e-Print Archive

Malleable coding: compressed palimpsests

Author: Goyal Vivek K.
Kusuma Julius
Varshney Lav R.
Publication venue
Publication date: 01/01/2018
Field of study

A malleable coding scheme considers not only compression efficiency but also the ease of alteration, thus encouraging some form of recycling of an old compressed version in the formation of a new one. Malleability cost is the difficulty of synchronizing compressed versions, and malleable codes are of particular interest when representing information and modifying the representation are both expensive. We examine the trade-off between compression efficiency and malleability cost under a malleability metric defined with respect to a string edit distance. This problem introduces a metric topology to the compressed domain. We characterize the achievable rates and malleability as the solution of a subgraph isomorphism problem. This can be used to argue that allowing conditional entropy of the edited message given the original message to grow linearly with block length creates an exponential increase in code length.First author draf

Boston University Institutional Repository (OpenBU)

Hidden Addressing Encoding for DNA Storage

Author: Bin Wang
Lijun Sun
Penghao Wang
Shuqing Si
Ziniu Mu
Publication venue: 'Frontiers Media SA'
Publication date: 01/07/2022
Field of study

DNA is a natural storage medium with the advantages of high storage density and long service life compared with traditional media. DNA storage can meet the current storage requirements for massive data. Owing to the limitations of the DNA storage technology, the data need to be converted into short DNA sequences for storage. However, in the process, a large amount of physical redundancy will be generated to index short DNA sequences. To reduce redundancy, this study proposes a DNA storage encoding scheme with hidden addressing. Using the improved fountain encoding scheme, the index replaces part of the data to realize hidden addresses, and then, a 10.1 MB file is encoded with the hidden addressing. First, the Dottup dot plot generator and the Jaccard similarity coefficient analyze the overall self-similarity of the encoding sequence index, and then the sequence fragments of GC content are used to verify the performance of this scheme. The final results show that the encoding scheme indexes with overall lower self-similarity, and the local thermodynamic properties of the sequence are better. The hidden addressing encoding scheme proposed can not only improve the utilization of bases but also ensure the correct rate of DNA storage during the sequencing and decoding processes

Directory of Open Access Journals