Search CORE

239 research outputs found

Near-Linear Time Insertion-Deletion Codes and (1+ $\varepsilon$ )-Approximating Edit Distance via Indexing

Author: Approximating
Efficiently
Goldwasser Shafi
Haeupler Bernhard
Haeupler Bernhard
Polylogarithmic
Selected
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 09/04/2019
Field of study

We introduce fast-decodable indexing schemes for edit distance which can be used to speed up edit distance computations to near-linear time if one of the strings is indexed by an indexing string

I

. In particular, for every length

n

and every

\varepsilon >0

, one can in near linear time construct a string

I \in \Sigma'^n

with

|\Sigma'| = O_{\varepsilon}(1)

, such that, indexing any string

S \in \Sigma^n

, symbol-by-symbol, with

I

results in a string

S' \in \Sigma''^n

where

\Sigma'' = \Sigma \times \Sigma'

for which edit distance computations are easy, i.e., one can compute a

(1+\varepsilon)

-approximation of the edit distance between

S'

and any other string in

O(n \text{poly}(\log n))

time. Our indexing schemes can be used to improve the decoding complexity of state-of-the-art error correcting codes for insertions and deletions. In particular, they lead to near-linear time decoding algorithms for the insertion-deletion codes of [Haeupler, Shahrasbi; STOC `17] and faster decoding algorithms for list-decodable insertion-deletion codes of [Haeupler, Shahrasbi, Sudan; ICALP `18]. Interestingly, the latter codes are a crucial ingredient in the construction of fast-decodable indexing schemes

arXiv.org e-Print Archive

Crossref

Optimal Error Rates for Interactive Coding II: Efficiency and List Decoding

Author: Ghaffari Mohsen
Haeupler Bernhard
Publication venue
Publication date: 15/04/2014
Field of study

We study coding schemes for error correction in interactive communications. Such interactive coding schemes simulate any

n

-round interactive protocol using

N

rounds over an adversarial channel that corrupts up to

\rho N

transmissions. Important performance measures for a coding scheme are its maximum tolerable error rate

\rho

, communication complexity

N

, and computational complexity. We give the first coding scheme for the standard setting which performs optimally in all three measures: Our randomized non-adaptive coding scheme has a near-linear computational complexity and tolerates any error rate

\delta < 1/4

with a linear

N = \Theta(n)

communication complexity. This improves over prior results which each performed well in two of these measures. We also give results for other settings of interest, namely, the first computationally and communication efficient schemes that tolerate

\rho < \frac{2}{7}

adaptively,

\rho < \frac{1}{3}

if only one party is required to decode, and

\rho < \frac{1}{2}

if list decoding is allowed. These are the optimal tolerable error rates for the respective settings. These coding schemes also have near linear computational and communication complexity. These results are obtained via two techniques: We give a general black-box reduction which reduces unique decoding, in various settings, to list decoding. We also show how to boost the computational and communication efficiency of any list decoder to become near linear.Comment: preliminary versio

arXiv.org e-Print Archive

Crossref

High rate locally-correctable and locally-testable codes with sub-polynomial query complexity

Author: Alon N.
Dinur I.
Friedl K.
Goldreich O.
Hemenway B.
Kopparty S.
Kopparty S.
Lipton R. J.
Trevisan L.
Publication venue
Publication date: 22/04/2015
Field of study

In this work, we construct the first locally-correctable codes (LCCs), and locally-testable codes (LTCs) with constant rate, constant relative distance, and sub-polynomial query complexity. Specifically, we show that there exist binary LCCs and LTCs with block length

n

, constant rate (which can even be taken arbitrarily close to 1), constant relative distance, and query complexity

\exp(\tilde{O}(\sqrt{\log n}))

. Previously such codes were known to exist only with

\Omega(n^{\beta})

query complexity (for constant

\beta > 0

), and there were several, quite different, constructions known. Our codes are based on a general distance-amplification method of Alon and Luby~\cite{AL96_codes}. We show that this method interacts well with local correctors and testers, and obtain our main results by applying it to suitably constructed LCCs and LTCs in the non-standard regime of \emph{sub-constant relative distance}. Along the way, we also construct LCCs and LTCs over large alphabets, with the same query complexity

\exp(\tilde{O}(\sqrt{\log n}))

, which additionally have the property of approaching the Singleton bound: they have almost the best-possible relationship between their rate and distance. This has the surprising consequence that asking for a large alphabet error-correcting code to further be an LCC or LTC with

\exp(\tilde{O}(\sqrt{\log n}))

query complexity does not require any sacrifice in terms of rate and distance! Such a result was previously not known for any

o(n)

query complexity. Our results on LCCs also immediately give locally-decodable codes (LDCs) with the same parameters

arXiv.org e-Print Archive

Crossref

Synchronization Strings: Codes for Insertions and Deletions Approaching the Singleton Bound

Author: Braverman Mark
Haeupler Bernhard
Haeupler Bernhard
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 03/04/2017
Field of study

We introduce synchronization strings as a novel way of efficiently dealing with synchronization errors, i.e., insertions and deletions. Synchronization errors are strictly more general and much harder to deal with than commonly considered half-errors, i.e., symbol corruptions and erasures. For every

\epsilon >0

, synchronization strings allow to index a sequence with an

\epsilon^{-O(1)}

size alphabet such that one can efficiently transform

k

synchronization errors into

(1+\epsilon)k

half-errors. This powerful new technique has many applications. In this paper, we focus on designing insdel codes, i.e., error correcting block codes (ECCs) for insertion deletion channels. While ECCs for both half-errors and synchronization errors have been intensely studied, the later has largely resisted progress. Indeed, it took until 1999 for the first insdel codes with constant rate, constant distance, and constant alphabet size to be constructed by Schulman and Zuckerman. Insdel codes for asymptotically large or small noise rates were given in 2016 by Guruswami et al. but these codes are still polynomially far from the optimal rate-distance tradeoff. This makes the understanding of insdel codes up to this work equivalent to what was known for regular ECCs after Forney introduced concatenated codes in his doctoral thesis 50 years ago. A direct application of our synchronization strings based indexing method gives a simple black-box construction which transforms any ECC into an equally efficient insdel code with a slightly larger alphabet size. This instantly transfers much of the highly developed understanding for regular ECCs over large constant alphabets into the realm of insdel codes. Most notably, we obtain efficient insdel codes which get arbitrarily close to the optimal rate-distance tradeoff given by the Singleton bound for the complete noise spectrum

arXiv.org e-Print Archive

Crossref

Locally Decodable Codes with Randomized Encoding

Author: Cheng Kuan
Li Xin
Zheng Yu
Publication venue
Publication date: 10/01/2020
Field of study

We initiate a study of locally decodable codes with randomized encoding. Standard locally decodable codes are error correcting codes with a deterministic encoding function and a randomized decoding function, such that any desired message bit can be recovered with good probability by querying only a small number of positions in the corrupted codeword. This allows one to recover any message bit very efficiently in sub-linear or even logarithmic time. Besides this straightforward application, locally decodable codes have also found many other applications such as private information retrieval, secure multiparty computation, and average-case complexity. However, despite extensive research, the tradeoff between the rate of the code and the number of queries is somewhat disappointing. For example, the best known constructions still need super-polynomially long codeword length even with a logarithmic number of queries, and need a polynomial number of queries to achieve a constant rate. In this paper, we show that by using a randomized encoding, in several models we can achieve significantly better rate-query tradeoff. In addition, our codes work for both the standard Hamming errors, and the more general and harder edit errors.Comment: 23 page

arXiv.org e-Print Archive

Cryptology ePrint Archive

Combinatorial limitations of average-radius list-decoding

Author: Guruswami Venkatesan
Narayanan Srivatsan
Publication venue
Publication date: 01/01/2013
Field of study

We study certain combinatorial aspects of list-decoding, motivated by the exponential gap between the known upper bound (of

O(1/\gamma)

) and lower bound (of

\Omega_p(\log (1/\gamma))

) for the list-size needed to decode up to radius

p

with rate

\gamma

away from capacity, i.e., 1-\h(p)-\gamma (here

p\in (0,1/2)

and

\gamma > 0

). Our main result is the following: We prove that in any binary code

C \subseteq \{0,1\}^n

of rate 1-\h(p)-\gamma, there must exist a set

\mathcal{L} \subset C

\Omega_p(1/\sqrt{\gamma})

codewords such that the average distance of the points in

\mathcal{L}

from their centroid is at most

pn

. In other words, there must exist

\Omega_p(1/\sqrt{\gamma})

codewords with low "average radius." The standard notion of list-decoding corresponds to working with the maximum distance of a collection of codewords from a center instead of average distance. The average-radius form is in itself quite natural and is implied by the classical Johnson bound. The remaining results concern the standard notion of list-decoding, and help clarify the combinatorial landscape of list-decoding: 1. We give a short simple proof, over all fixed alphabets, of the above-mentioned

\Omega_p(\log (\gamma))

lower bound. Earlier, this bound followed from a complicated, more general result of Blinovsky. 2. We show that one {\em cannot} improve the

\Omega_p(\log (1/\gamma))

lower bound via techniques based on identifying the zero-rate regime for list decoding of constant-weight codes. 3. We show a "reverse connection" showing that constant-weight codes for list decoding imply general codes for list decoding with higher rate. 4. We give simple second moment based proofs of tight (up to constant factors) lower bounds on the list-size needed for list decoding random codes and random linear codes from errors as well as erasures.Comment: 28 pages. Extended abstract in RANDOM 201

arXiv.org e-Print Archive

Fuzzy Extractors: How to Generate Strong Keys from Biometrics and Other Noisy Data

Author: Adam Smith
Karatsuba A. A.
Leonid Reyzin
Rafail Ostrovsky
Shaltiel R.
Yevgeniy Dodis
Publication venue: 'Society for Industrial & Applied Mathematics (SIAM)'
Publication date: 01/01/2004
Field of study

We provide formal definitions and efficient secure techniques for - turning noisy information into keys usable for any cryptographic application, and, in particular, - reliably and securely authenticating biometric data. Our techniques apply not just to biometric information, but to any keying material that, unlike traditional cryptographic keys, is (1) not reproducible precisely and (2) not distributed uniformly. We propose two primitives: a "fuzzy extractor" reliably extracts nearly uniform randomness R from its input; the extraction is error-tolerant in the sense that R will be the same even if the input changes, as long as it remains reasonably close to the original. Thus, R can be used as a key in a cryptographic application. A "secure sketch" produces public information about its input w that does not reveal w, and yet allows exact recovery of w given another value that is close to w. Thus, it can be used to reliably reproduce error-prone biometric inputs without incurring the security risk inherent in storing them. We define the primitives to be both formally secure and versatile, generalizing much prior work. In addition, we provide nearly optimal constructions of both primitives for various measures of ``closeness'' of input data, such as Hamming distance, edit distance, and set difference.Comment: 47 pp., 3 figures. Prelim. version in Eurocrypt 2004, Springer LNCS 3027, pp. 523-540. Differences from version 3: minor edits for grammar, clarity, and typo

arXiv.org e-Print Archive

CiteSeerX

Crossref

Cryptology ePrint Archive