24,548 research outputs found
Regular Expression Search on Compressed Text
We present an algorithm for searching regular expression matches in
compressed text. The algorithm reports the number of matching lines in the
uncompressed text in time linear in the size of its compressed version. We
define efficient data structures that yield nearly optimal complexity bounds
and provide a sequential implementation --zearch-- that requires up to 25% less
time than the state of the art.Comment: 10 pages, published in Data Compression Conference (DCC'19
Average Sensitivity of Graph Algorithms
In modern applications of graphs algorithms, where the graphs of interest are
large and dynamic, it is unrealistic to assume that an input representation
contains the full information of a graph being studied. Hence, it is desirable
to use algorithms that, even when only a (large) subgraph is available, output
solutions that are close to the solutions output when the whole graph is
available. We formalize this idea by introducing the notion of average
sensitivity of graph algorithms, which is the average earth mover's distance
between the output distributions of an algorithm on a graph and its subgraph
obtained by removing an edge, where the average is over the edges removed and
the distance between two outputs is the Hamming distance.
In this work, we initiate a systematic study of average sensitivity. After
deriving basic properties of average sensitivity such as composition, we
provide efficient approximation algorithms with low average sensitivities for
concrete graph problems, including the minimum spanning forest problem, the
global minimum cut problem, the minimum - cut problem, and the maximum
matching problem. In addition, we prove that the average sensitivity of our
global minimum cut algorithm is almost optimal, by showing a nearly matching
lower bound. We also show that every algorithm for the 2-coloring problem has
average sensitivity linear in the number of vertices. One of the main ideas
involved in designing our algorithms with low average sensitivity is the
following fact; if the presence of a vertex or an edge in the solution output
by an algorithm can be decided locally, then the algorithm has a low average
sensitivity, allowing us to reuse the analyses of known sublinear-time
algorithms and local computation algorithms (LCAs). Using this connection, we
show that every LCA for 2-coloring has linear query complexity, thereby
answering an open question.Comment: 39 pages, 1 figur
Edit Distance: Sketching, Streaming and Document Exchange
We show that in the document exchange problem, where Alice holds and Bob holds , Alice can send Bob a message of
size bits such that Bob can recover using the
message and his input if the edit distance between and is no more
than , and output "error" otherwise. Both the encoding and decoding can be
done in time . This result significantly
improves the previous communication bounds under polynomial encoding/decoding
time. We also show that in the referee model, where Alice and Bob hold and
respectively, they can compute sketches of and of sizes
bits (the encoding), and send to the referee, who can
then compute the edit distance between and together with all the edit
operations if the edit distance is no more than , and output "error"
otherwise (the decoding). To the best of our knowledge, this is the first
result for sketching edit distance using bits.
Moreover, the encoding phase of our sketching algorithm can be performed by
scanning the input string in one pass. Thus our sketching algorithm also
implies the first streaming algorithm for computing edit distance and all the
edits exactly using bits of space.Comment: Full version of an article to be presented at the 57th Annual IEEE
Symposium on Foundations of Computer Science (FOCS 2016
Efficient LZ78 factorization of grammar compressed text
We present an efficient algorithm for computing the LZ78 factorization of a
text, where the text is represented as a straight line program (SLP), which is
a context free grammar in the Chomsky normal form that generates a single
string. Given an SLP of size representing a text of length , our
algorithm computes the LZ78 factorization of in time
and space, where is the number of resulting LZ78 factors.
We also show how to improve the algorithm so that the term in the
time and space complexities becomes either , where is the length of the
longest LZ78 factor, or where is a quantity
which depends on the amount of redundancy that the SLP captures with respect to
substrings of of a certain length. Since where
is the alphabet size, the latter is asymptotically at least as fast as
a linear time algorithm which runs on the uncompressed string when is
constant, and can be more efficient when the text is compressible, i.e. when
and are small.Comment: SPIRE 201
- …