10,341 research outputs found
Optimal k-Deletion Correcting Codes
Levenshtein introduced the problem of constructing k-deletion correcting codes in 1966, proved that the optimal redundancy
of those codes is O(k log N), and proposed an optimal redundancy single-deletion correcting code (using the so-called VT
construction). However, the problem of constructing optimal redundancy k-deletion correcting codes remained open. Our key
contribution is a solution to this longstanding open problem. We present a k-deletion correcting code that has redundancy 8k log n+
o(log n) and encoding/decoding algorithms of complexity O(n^(2k+1)) for constant k
Efficient Linear and Affine Codes for Correcting Insertions/Deletions
This paper studies \emph{linear} and \emph{affine} error-correcting codes for
correcting synchronization errors such as insertions and deletions. We call
such codes linear/affine insdel codes.
Linear codes that can correct even a single deletion are limited to have
information rate at most (achieved by the trivial 2-fold repetition
code). Previously, it was (erroneously) reported that more generally no
non-trivial linear codes correcting deletions exist, i.e., that the
-fold repetition codes and its rate of are basically optimal
for any . We disprove this and show the existence of binary linear codes of
length and rate just below capable of correcting
insertions and deletions. This identifies rate as a sharp threshold for
recovery from deletions for linear codes, and reopens the quest for a better
understanding of the capabilities of linear codes for correcting
insertions/deletions.
We prove novel outer bounds and existential inner bounds for the rate vs.
(edit) distance trade-off of linear insdel codes. We complement our existential
results with an efficient synchronization-string-based transformation that
converts any asymptotically-good linear code for Hamming errors into an
asymptotically-good linear code for insdel errors. Lastly, we show that the
-rate limitation does not hold for affine codes by giving an
explicit affine code of rate which can efficiently correct a
constant fraction of insdel errors
Synchronization Strings: Codes for Insertions and Deletions Approaching the Singleton Bound
We introduce synchronization strings as a novel way of efficiently dealing
with synchronization errors, i.e., insertions and deletions. Synchronization
errors are strictly more general and much harder to deal with than commonly
considered half-errors, i.e., symbol corruptions and erasures. For every
, synchronization strings allow to index a sequence with an
size alphabet such that one can efficiently transform
synchronization errors into half-errors. This powerful new
technique has many applications. In this paper, we focus on designing insdel
codes, i.e., error correcting block codes (ECCs) for insertion deletion
channels.
While ECCs for both half-errors and synchronization errors have been
intensely studied, the later has largely resisted progress. Indeed, it took
until 1999 for the first insdel codes with constant rate, constant distance,
and constant alphabet size to be constructed by Schulman and Zuckerman. Insdel
codes for asymptotically large or small noise rates were given in 2016 by
Guruswami et al. but these codes are still polynomially far from the optimal
rate-distance tradeoff. This makes the understanding of insdel codes up to this
work equivalent to what was known for regular ECCs after Forney introduced
concatenated codes in his doctoral thesis 50 years ago.
A direct application of our synchronization strings based indexing method
gives a simple black-box construction which transforms any ECC into an equally
efficient insdel code with a slightly larger alphabet size. This instantly
transfers much of the highly developed understanding for regular ECCs over
large constant alphabets into the realm of insdel codes. Most notably, we
obtain efficient insdel codes which get arbitrarily close to the optimal
rate-distance tradeoff given by the Singleton bound for the complete noise
spectrum
Non-asymptotic Upper Bounds for Deletion Correcting Codes
Explicit non-asymptotic upper bounds on the sizes of multiple-deletion
correcting codes are presented. In particular, the largest single-deletion
correcting code for -ary alphabet and string length is shown to be of
size at most . An improved bound on the asymptotic
rate function is obtained as a corollary. Upper bounds are also derived on
sizes of codes for a constrained source that does not necessarily comprise of
all strings of a particular length, and this idea is demonstrated by
application to sets of run-length limited strings.
The problem of finding the largest deletion correcting code is modeled as a
matching problem on a hypergraph. This problem is formulated as an integer
linear program. The upper bound is obtained by the construction of a feasible
point for the dual of the linear programming relaxation of this integer linear
program.
The non-asymptotic bounds derived imply the known asymptotic bounds of
Levenshtein and Tenengolts and improve on known non-asymptotic bounds.
Numerical results support the conjecture that in the binary case, the
Varshamov-Tenengolts codes are the largest single-deletion correcting codes.Comment: 18 pages, 4 figure
Spectrum of Sizes for Perfect Deletion-Correcting Codes
One peculiarity with deletion-correcting codes is that perfect
-deletion-correcting codes of the same length over the same alphabet can
have different numbers of codewords, because the balls of radius with
respect to the Levenshte\u{\i}n distance may be of different sizes. There is
interest, therefore, in determining all possible sizes of a perfect
-deletion-correcting code, given the length and the alphabet size~.
In this paper, we determine completely the spectrum of possible sizes for
perfect -ary 1-deletion-correcting codes of length three for all , and
perfect -ary 2-deletion-correcting codes of length four for almost all ,
leaving only a small finite number of cases in doubt.Comment: 23 page
- …