4,968 research outputs found
Edit Distance: Sketching, Streaming and Document Exchange
We show that in the document exchange problem, where Alice holds and Bob holds , Alice can send Bob a message of
size bits such that Bob can recover using the
message and his input if the edit distance between and is no more
than , and output "error" otherwise. Both the encoding and decoding can be
done in time . This result significantly
improves the previous communication bounds under polynomial encoding/decoding
time. We also show that in the referee model, where Alice and Bob hold and
respectively, they can compute sketches of and of sizes
bits (the encoding), and send to the referee, who can
then compute the edit distance between and together with all the edit
operations if the edit distance is no more than , and output "error"
otherwise (the decoding). To the best of our knowledge, this is the first
result for sketching edit distance using bits.
Moreover, the encoding phase of our sketching algorithm can be performed by
scanning the input string in one pass. Thus our sketching algorithm also
implies the first streaming algorithm for computing edit distance and all the
edits exactly using bits of space.Comment: Full version of an article to be presented at the 57th Annual IEEE
Symposium on Foundations of Computer Science (FOCS 2016
Block Edit Errors with Transpositions: Deterministic Document Exchange Protocols and Almost Optimal Binary Codes
Document exchange and error correcting codes are two fundamental problems regarding communications. In the first problem, Alice and Bob each holds a string, and the goal is for Alice to send a short sketch to Bob, so that Bob can recover Alice\u27s string. In the second problem, Alice sends a message with some redundant information to Bob through a channel that can add adversarial errors, and the goal is for Bob to correctly recover the message despite the errors. In both problems, an upper bound is placed on the number of errors between the two strings or that the channel can add, and a major goal is to minimize the size of the sketch or the redundant information. In this paper we focus on deterministic document exchange protocols and binary error correcting codes.
Both problems have been studied extensively. In the case of Hamming errors (i.e., bit substitutions) and bit erasures, we have explicit constructions with asymptotically optimal parameters. However, other error types are still rather poorly understood. In a recent work [Kuan Cheng et al., 2018], the authors constructed explicit deterministic document exchange protocols and binary error correcting codes for edit errors with almost optimal parameters. Unfortunately, the constructions in [Kuan Cheng et al., 2018] do not work for other common errors such as block transpositions.
In this paper, we generalize the constructions in [Kuan Cheng et al., 2018] to handle a much larger class of errors. These include bursts of insertions and deletions, as well as block transpositions. Specifically, we consider document exchange and error correcting codes where the total number of block insertions, block deletions, and block transpositions is at most k <= alpha n/log n for some constant 0<alpha<1. In addition, the total number of bits inserted and deleted by the first two kinds of operations is at most t <= beta n for some constant 0<beta<1, where n is the length of Alice\u27s string or message. We construct explicit, deterministic document exchange protocols with sketch size O((k log n +t) log^2 n/{k log n + t}) and explicit binary error correcting code with O(k log n log log log n+t) redundant bits. As a comparison, the information-theoretic optimum for both problems is Theta(k log n+t). As far as we know, previously there are no known explicit deterministic document exchange protocols in this case, and the best known binary code needs Omega(n) redundant bits even to correct just one block transposition [L. J. Schulman and D. Zuckerman, 1999]
Communication and Streaming Complexity of Approximate Pattern Matching
We consider the approximate pattern matching problem. Given a text T of length 2n and a pattern P of length n, the task is to decide for each prefix T[1, j] of T if it ends with a string that is at the edit distance at most k from P. If this is the case, we must output the edit distance and the corresponding edit operations. We first show the communication complexity of the problem for the case when Alice and Bob both share the pattern and Alice holds the first half of the text and Bob the second half, and for the case when Alice holds the first half of the text, Bob the second half of the text, and Charlie the pattern. We then develop the first sublinear-space streaming algorithm for the problem. The algorithm is randomised with error probability at most 1/poly(n)
The One-Way Communication Complexity of Dynamic Time Warping Distance
We resolve the randomized one-way communication complexity of Dynamic Time Warping (DTW) distance. We show that there is an efficient one-way communication protocol using O~(n/alpha) bits for the problem of computing an alpha-approximation for DTW between strings x and y of length n, and we prove a lower bound of Omega(n / alpha) bits for the same problem. Our communication protocol works for strings over an arbitrary metric of polynomial size and aspect ratio, and we optimize the logarithmic factors depending on properties of the underlying metric, such as when the points are low-dimensional integer vectors equipped with various metrics or have bounded doubling dimension. We also consider linear sketches of DTW, showing that such sketches must have size Omega(n)
- …