238 research outputs found

    Near-Linear Time Insertion-Deletion Codes and (1+ε\varepsilon)-Approximating Edit Distance via Indexing

    Full text link
    We introduce fast-decodable indexing schemes for edit distance which can be used to speed up edit distance computations to near-linear time if one of the strings is indexed by an indexing string II. In particular, for every length nn and every ε>0\varepsilon >0, one can in near linear time construct a string IΣnI \in \Sigma'^n with Σ=Oε(1)|\Sigma'| = O_{\varepsilon}(1), such that, indexing any string SΣnS \in \Sigma^n, symbol-by-symbol, with II results in a string SΣnS' \in \Sigma''^n where Σ=Σ×Σ\Sigma'' = \Sigma \times \Sigma' for which edit distance computations are easy, i.e., one can compute a (1+ε)(1+\varepsilon)-approximation of the edit distance between SS' and any other string in O(npoly(logn))O(n \text{poly}(\log n)) time. Our indexing schemes can be used to improve the decoding complexity of state-of-the-art error correcting codes for insertions and deletions. In particular, they lead to near-linear time decoding algorithms for the insertion-deletion codes of [Haeupler, Shahrasbi; STOC `17] and faster decoding algorithms for list-decodable insertion-deletion codes of [Haeupler, Shahrasbi, Sudan; ICALP `18]. Interestingly, the latter codes are a crucial ingredient in the construction of fast-decodable indexing schemes

    Beyond Single-Deletion Correcting Codes: Substitutions and Transpositions

    Get PDF
    We consider the problem of designing low-redundancy codes in settings where one must correct deletions in conjunction with substitutions or adjacent transpositions; a combination of errors that is usually observed in DNA-based data storage. One of the most basic versions of this problem was settled more than 50 years ago by Levenshtein, who proved that binary Varshamov-Tenengolts codes correct one arbitrary edit error, i.e., one deletion or one substitution, with nearly optimal redundancy. However, this approach fails to extend to many simple and natural variations of the binary single-edit error setting. In this work, we make progress on the code design problem above in three such variations: - We construct linear-time encodable and decodable length-n non-binary codes correcting a single edit error with nearly optimal redundancy log n+O(log log n), providing an alternative simpler proof of a result by Cai, Chee, Gabrys, Kiah, and Nguyen (IEEE Trans. Inf. Theory 2021). This is achieved by employing what we call weighted VT sketches, a new notion that may be of independent interest. - We show the existence of a binary code correcting one deletion or one adjacent transposition with nearly optimal redundancy log n+O(log log n). - We construct linear-time encodable and list-decodable binary codes with list-size 2 for one deletion and one substitution with redundancy 4log n+O(log log n). This matches the existential bound up to an O(log log n) additive term

    Synchronization Strings: Explicit Constructions, Local Decoding, and Applications

    Full text link
    This paper gives new results for synchronization strings, a powerful combinatorial object that allows to efficiently deal with insertions and deletions in various communication settings: \bullet We give a deterministic, linear time synchronization string construction, improving over an O(n5)O(n^5) time randomized construction. Independently of this work, a deterministic O(nlog2logn)O(n\log^2\log n) time construction was just put on arXiv by Cheng, Li, and Wu. We also give a deterministic linear time construction of an infinite synchronization string, which was not known to be computable before. Both constructions are highly explicit, i.e., the ithi^{th} symbol can be computed in O(logi)O(\log i) time. \bullet This paper also introduces a generalized notion we call long-distance synchronization strings that allow for local and very fast decoding. In particular, only O(log3n)O(\log^3 n) time and access to logarithmically many symbols is required to decode any index. We give several applications for these results: \bullet For any δ0\delta0 we provide an insdel correcting code with rate 1δϵ1-\delta-\epsilon which can correct any O(δ)O(\delta) fraction of insdel errors in O(nlog3n)O(n\log^3n) time. This near linear computational efficiency is surprising given that we do not even know how to compute the (edit) distance between the decoding input and output in sub-quadratic time. We show that such codes can not only efficiently recover from δ\delta fraction of insdel errors but, similar to [Schulman, Zuckerman; TransInf'99], also from any O(δ/logn)O(\delta/\log n) fraction of block transpositions and replications. \bullet We show that highly explicitness and local decoding allow for infinite channel simulations with exponentially smaller memory and decoding time requirements. These simulations can be used to give the first near linear time interactive coding scheme for insdel errors

    An Iteratively Decodable Tensor Product Code with Application to Data Storage

    Full text link
    The error pattern correcting code (EPCC) can be constructed to provide a syndrome decoding table targeting the dominant error events of an inter-symbol interference channel at the output of the Viterbi detector. For the size of the syndrome table to be manageable and the list of possible error events to be reasonable in size, the codeword length of EPCC needs to be short enough. However, the rate of such a short length code will be too low for hard drive applications. To accommodate the required large redundancy, it is possible to record only a highly compressed function of the parity bits of EPCC's tensor product with a symbol correcting code. In this paper, we show that the proposed tensor error-pattern correcting code (T-EPCC) is linear time encodable and also devise a low-complexity soft iterative decoding algorithm for EPCC's tensor product with q-ary LDPC (T-EPCC-qLDPC). Simulation results show that T-EPCC-qLDPC achieves almost similar performance to single-level qLDPC with a 1/2 KB sector at 50% reduction in decoding complexity. Moreover, 1 KB T-EPCC-qLDPC surpasses the performance of 1/2 KB single-level qLDPC at the same decoder complexity.Comment: Hakim Alhussien, Jaekyun Moon, "An Iteratively Decodable Tensor Product Code with Application to Data Storage

    It'll probably work out: improved list-decoding through random operations

    Full text link
    In this work, we introduce a framework to study the effect of random operations on the combinatorial list-decodability of a code. The operations we consider correspond to row and column operations on the matrix obtained from the code by stacking the codewords together as columns. This captures many natural transformations on codes, such as puncturing, folding, and taking subcodes; we show that many such operations can improve the list-decoding properties of a code. There are two main points to this. First, our goal is to advance our (combinatorial) understanding of list-decodability, by understanding what structure (or lack thereof) is necessary to obtain it. Second, we use our more general results to obtain a few interesting corollaries for list decoding: (1) We show the existence of binary codes that are combinatorially list-decodable from 1/2ϵ1/2-\epsilon fraction of errors with optimal rate Ω(ϵ2)\Omega(\epsilon^2) that can be encoded in linear time. (2) We show that any code with Ω(1)\Omega(1) relative distance, when randomly folded, is combinatorially list-decodable 1ϵ1-\epsilon fraction of errors with high probability. This formalizes the intuition for why the folding operation has been successful in obtaining codes with optimal list decoding parameters; previously, all arguments used algebraic methods and worked only with specific codes. (3) We show that any code which is list-decodable with suboptimal list sizes has many subcodes which have near-optimal list sizes, while retaining the error correcting capabilities of the original code. This generalizes recent results where subspace evasive sets have been used to reduce list sizes of codes that achieve list decoding capacity

    Subquadratic time encodable codes beating the Gilbert-Varshamov bound

    Full text link
    We construct explicit algebraic geometry codes built from the Garcia-Stichtenoth function field tower beating the Gilbert-Varshamov bound for alphabet sizes at least 192. Messages are identied with functions in certain Riemann-Roch spaces associated with divisors supported on multiple places. Encoding amounts to evaluating these functions at degree one places. By exploiting algebraic structures particular to the Garcia-Stichtenoth tower, we devise an intricate deterministic \omega/2 < 1.19 runtime exponent encoding and 1+\omega/2 < 2.19 expected runtime exponent randomized (unique and list) decoding algorithms. Here \omega < 2.373 is the matrix multiplication exponent. If \omega = 2, as widely believed, the encoding and decoding runtimes are respectively nearly linear and nearly quadratic. Prior to this work, encoding (resp. decoding) time of code families beating the Gilbert-Varshamov bound were quadratic (resp. cubic) or worse

    Locally Encodable and Decodable Codes for Distributed Storage Systems

    Full text link
    We consider the locality of encoding and decoding operations in distributed storage systems (DSS), and propose a new class of codes, called locally encodable and decodable codes (LEDC), that provides a higher degree of operational locality compared to currently known codes. For a given locality structure, we derive an upper bound on the global distance and demonstrate the existence of an optimal LEDC for sufficiently large field size. In addition, we also construct two families of optimal LEDC for fields with size linear in code length.Comment: 7 page

    Improved Nearly-MDS Expander Codes

    Full text link
    A construction of expander codes is presented with the following three properties: (i) the codes lie close to the Singleton bound, (ii) they can be encoded in time complexity that is linear in their code length, and (iii) they have a linear-time bounded-distance decoder. By using a version of the decoder that corrects also erasures, the codes can replace MDS outer codes in concatenated constructions, thus resulting in linear-time encodable and decodable codes that approach the Zyablov bound or the capacity of memoryless channels. The presented construction improves on an earlier result by Guruswami and Indyk in that any rate and relative minimum distance that lies below the Singleton bound is attainable for a significantly smaller alphabet size.Comment: Part of this work was presented at the 2004 IEEE Int'l Symposium on Information Theory (ISIT'2004), Chicago, Illinois (June 2004). This work was submitted to IEEE Transactions on Information Theory on January 21, 2005. To appear in IEEE Transactions on Information Theory, August 2006. 12 page
    corecore