Search CORE

8,962 research outputs found

Synchronization Strings: Explicit Constructions, Local Decoding, and Applications

Author: An
Fast
Guruswami Venkatesan
Haeupler Bernhard
Haeupler Bernhard
Haeupler Bernhard
Hemenway Brett
Sherstov Alexander A
Publication venue
Publication date: 09/11/2017
Field of study

This paper gives new results for synchronization strings, a powerful combinatorial object that allows to efficiently deal with insertions and deletions in various communication settings:

\bullet

We give a deterministic, linear time synchronization string construction, improving over an

O(n^5)

time randomized construction. Independently of this work, a deterministic

O(n\log^2\log n)

time construction was just put on arXiv by Cheng, Li, and Wu. We also give a deterministic linear time construction of an infinite synchronization string, which was not known to be computable before. Both constructions are highly explicit, i.e., the

i^{th}

symbol can be computed in

O(\log i)

time.

\bullet

This paper also introduces a generalized notion we call long-distance synchronization strings that allow for local and very fast decoding. In particular, only

O(\log^3 n)

time and access to logarithmically many symbols is required to decode any index. We give several applications for these results:

\bullet

For any

\delta0

we provide an insdel correcting code with rate

1-\delta-\epsilon

which can correct any

O(\delta)

fraction of insdel errors in

O(n\log^3n)

time. This near linear computational efficiency is surprising given that we do not even know how to compute the (edit) distance between the decoding input and output in sub-quadratic time. We show that such codes can not only efficiently recover from

\delta

fraction of insdel errors but, similar to [Schulman, Zuckerman; TransInf'99], also from any

O(\delta/\log n)

fraction of block transpositions and replications.

\bullet

We show that highly explicitness and local decoding allow for infinite channel simulations with exponentially smaller memory and decoding time requirements. These simulations can be used to give the first near linear time interactive coding scheme for insdel errors

arXiv.org e-Print Archive

Crossref

Synchronization Strings: Codes for Insertions and Deletions Approaching the Singleton Bound

Author: Braverman Mark
Haeupler Bernhard
Haeupler Bernhard
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 03/04/2017
Field of study

We introduce synchronization strings as a novel way of efficiently dealing with synchronization errors, i.e., insertions and deletions. Synchronization errors are strictly more general and much harder to deal with than commonly considered half-errors, i.e., symbol corruptions and erasures. For every

\epsilon >0

, synchronization strings allow to index a sequence with an

\epsilon^{-O(1)}

size alphabet such that one can efficiently transform

k

synchronization errors into

(1+\epsilon)k

half-errors. This powerful new technique has many applications. In this paper, we focus on designing insdel codes, i.e., error correcting block codes (ECCs) for insertion deletion channels. While ECCs for both half-errors and synchronization errors have been intensely studied, the later has largely resisted progress. Indeed, it took until 1999 for the first insdel codes with constant rate, constant distance, and constant alphabet size to be constructed by Schulman and Zuckerman. Insdel codes for asymptotically large or small noise rates were given in 2016 by Guruswami et al. but these codes are still polynomially far from the optimal rate-distance tradeoff. This makes the understanding of insdel codes up to this work equivalent to what was known for regular ECCs after Forney introduced concatenated codes in his doctoral thesis 50 years ago. A direct application of our synchronization strings based indexing method gives a simple black-box construction which transforms any ECC into an equally efficient insdel code with a slightly larger alphabet size. This instantly transfers much of the highly developed understanding for regular ECCs over large constant alphabets into the realm of insdel codes. Most notably, we obtain efficient insdel codes which get arbitrarily close to the optimal rate-distance tradeoff given by the Singleton bound for the complete noise spectrum

arXiv.org e-Print Archive

Crossref

Guess & Check Codes for Deletions and Synchronization

Author: Hanna Serge Kas
Rouayheb Salim El
Publication venue
Publication date: 27/04/2017
Field of study

We consider the problem of constructing codes that can correct

\delta

deletions occurring in an arbitrary binary string of length

n

bits. Varshamov-Tenengolts (VT) codes can correct all possible single deletions

(\delta=1)

with an asymptotically optimal redundancy. Finding similar codes for

\delta \geq 2

deletions is an open problem. We propose a new family of codes, that we call Guess & Check (GC) codes, that can correct, with high probability, a constant number of deletions

\delta

occurring at uniformly random positions within an arbitrary string. The GC codes are based on MDS codes and have an asymptotically optimal redundancy that is

\Theta(\delta \log n)

. We provide deterministic polynomial time encoding and decoding schemes for these codes. We also describe the applications of GC codes to file synchronization.Comment: Accepted in ISIT 201

arXiv.org e-Print Archive

Crossref

Spectrum of Sizes for Perfect Deletion-Correcting Codes

Author: Alan C. H. Ling
Bennett F. E.
Gennian Ge
Harms J. J.
Levenshte n V. I.
Levenshte n V. I.
Levenshte n V. I.
Ratzer E. A.
Rees R.
Sarvate D. G.
Schönheim J.
Sellers F. F.
Street D. J.
Street D. J.
Varshamov R. R.
Yeow Meng Chee
Publication venue: 'Society for Industrial & Applied Mathematics (SIAM)'
Publication date: 01/01/2010
Field of study

One peculiarity with deletion-correcting codes is that perfect

t

-deletion-correcting codes of the same length over the same alphabet can have different numbers of codewords, because the balls of radius

t

with respect to the Levenshte\u{\i}n distance may be of different sizes. There is interest, therefore, in determining all possible sizes of a perfect

t

-deletion-correcting code, given the length

n

and the alphabet size~

q

. In this paper, we determine completely the spectrum of possible sizes for perfect

q

-ary 1-deletion-correcting codes of length three for all

q

, and perfect

q

-ary 2-deletion-correcting codes of length four for almost all

q

, leaving only a small finite number of cases in doubt.Comment: 23 page

arXiv.org e-Print Archive

CiteSeerX

Crossref

Non-asymptotic Upper Bounds for Deletion Correcting Codes

Author: Kiyavash Negar
Kulkarni Ankur A.
Publication venue
Publication date: 13/11/2012
Field of study

Explicit non-asymptotic upper bounds on the sizes of multiple-deletion correcting codes are presented. In particular, the largest single-deletion correcting code for

q

-ary alphabet and string length

n

is shown to be of size at most

\frac{q^n-q}{(q-1)(n-1)}

. An improved bound on the asymptotic rate function is obtained as a corollary. Upper bounds are also derived on sizes of codes for a constrained source that does not necessarily comprise of all strings of a particular length, and this idea is demonstrated by application to sets of run-length limited strings. The problem of finding the largest deletion correcting code is modeled as a matching problem on a hypergraph. This problem is formulated as an integer linear program. The upper bound is obtained by the construction of a feasible point for the dual of the linear programming relaxation of this integer linear program. The non-asymptotic bounds derived imply the known asymptotic bounds of Levenshtein and Tenengolts and improve on known non-asymptotic bounds. Numerical results support the conjecture that in the binary case, the Varshamov-Tenengolts codes are the largest single-deletion correcting codes.Comment: 18 pages, 4 figure

arXiv.org e-Print Archive

CiteSeerX