10,341 research outputs found

    Optimal k-Deletion Correcting Codes

    Get PDF
    Levenshtein introduced the problem of constructing k-deletion correcting codes in 1966, proved that the optimal redundancy of those codes is O(k log N), and proposed an optimal redundancy single-deletion correcting code (using the so-called VT construction). However, the problem of constructing optimal redundancy k-deletion correcting codes remained open. Our key contribution is a solution to this longstanding open problem. We present a k-deletion correcting code that has redundancy 8k log n+ o(log n) and encoding/decoding algorithms of complexity O(n^(2k+1)) for constant k

    Efficient Linear and Affine Codes for Correcting Insertions/Deletions

    Full text link
    This paper studies \emph{linear} and \emph{affine} error-correcting codes for correcting synchronization errors such as insertions and deletions. We call such codes linear/affine insdel codes. Linear codes that can correct even a single deletion are limited to have information rate at most 1/21/2 (achieved by the trivial 2-fold repetition code). Previously, it was (erroneously) reported that more generally no non-trivial linear codes correcting kk deletions exist, i.e., that the (k+1)(k+1)-fold repetition codes and its rate of 1/(k+1)1/(k+1) are basically optimal for any kk. We disprove this and show the existence of binary linear codes of length nn and rate just below 1/21/2 capable of correcting Ω(n)\Omega(n) insertions and deletions. This identifies rate 1/21/2 as a sharp threshold for recovery from deletions for linear codes, and reopens the quest for a better understanding of the capabilities of linear codes for correcting insertions/deletions. We prove novel outer bounds and existential inner bounds for the rate vs. (edit) distance trade-off of linear insdel codes. We complement our existential results with an efficient synchronization-string-based transformation that converts any asymptotically-good linear code for Hamming errors into an asymptotically-good linear code for insdel errors. Lastly, we show that the 12\frac{1}{2}-rate limitation does not hold for affine codes by giving an explicit affine code of rate 1−ϵ1-\epsilon which can efficiently correct a constant fraction of insdel errors

    Synchronization Strings: Codes for Insertions and Deletions Approaching the Singleton Bound

    Full text link
    We introduce synchronization strings as a novel way of efficiently dealing with synchronization errors, i.e., insertions and deletions. Synchronization errors are strictly more general and much harder to deal with than commonly considered half-errors, i.e., symbol corruptions and erasures. For every ϵ>0\epsilon >0, synchronization strings allow to index a sequence with an ϵ−O(1)\epsilon^{-O(1)} size alphabet such that one can efficiently transform kk synchronization errors into (1+ϵ)k(1+\epsilon)k half-errors. This powerful new technique has many applications. In this paper, we focus on designing insdel codes, i.e., error correcting block codes (ECCs) for insertion deletion channels. While ECCs for both half-errors and synchronization errors have been intensely studied, the later has largely resisted progress. Indeed, it took until 1999 for the first insdel codes with constant rate, constant distance, and constant alphabet size to be constructed by Schulman and Zuckerman. Insdel codes for asymptotically large or small noise rates were given in 2016 by Guruswami et al. but these codes are still polynomially far from the optimal rate-distance tradeoff. This makes the understanding of insdel codes up to this work equivalent to what was known for regular ECCs after Forney introduced concatenated codes in his doctoral thesis 50 years ago. A direct application of our synchronization strings based indexing method gives a simple black-box construction which transforms any ECC into an equally efficient insdel code with a slightly larger alphabet size. This instantly transfers much of the highly developed understanding for regular ECCs over large constant alphabets into the realm of insdel codes. Most notably, we obtain efficient insdel codes which get arbitrarily close to the optimal rate-distance tradeoff given by the Singleton bound for the complete noise spectrum

    Non-asymptotic Upper Bounds for Deletion Correcting Codes

    Full text link
    Explicit non-asymptotic upper bounds on the sizes of multiple-deletion correcting codes are presented. In particular, the largest single-deletion correcting code for qq-ary alphabet and string length nn is shown to be of size at most qn−q(q−1)(n−1)\frac{q^n-q}{(q-1)(n-1)}. An improved bound on the asymptotic rate function is obtained as a corollary. Upper bounds are also derived on sizes of codes for a constrained source that does not necessarily comprise of all strings of a particular length, and this idea is demonstrated by application to sets of run-length limited strings. The problem of finding the largest deletion correcting code is modeled as a matching problem on a hypergraph. This problem is formulated as an integer linear program. The upper bound is obtained by the construction of a feasible point for the dual of the linear programming relaxation of this integer linear program. The non-asymptotic bounds derived imply the known asymptotic bounds of Levenshtein and Tenengolts and improve on known non-asymptotic bounds. Numerical results support the conjecture that in the binary case, the Varshamov-Tenengolts codes are the largest single-deletion correcting codes.Comment: 18 pages, 4 figure

    Spectrum of Sizes for Perfect Deletion-Correcting Codes

    Full text link
    One peculiarity with deletion-correcting codes is that perfect tt-deletion-correcting codes of the same length over the same alphabet can have different numbers of codewords, because the balls of radius tt with respect to the Levenshte\u{\i}n distance may be of different sizes. There is interest, therefore, in determining all possible sizes of a perfect tt-deletion-correcting code, given the length nn and the alphabet size~qq. In this paper, we determine completely the spectrum of possible sizes for perfect qq-ary 1-deletion-correcting codes of length three for all qq, and perfect qq-ary 2-deletion-correcting codes of length four for almost all qq, leaving only a small finite number of cases in doubt.Comment: 23 page
    • …