34 research outputs found
Random Access to Grammar Compressed Strings
Grammar based compression, where one replaces a long string by a small
context-free grammar that generates the string, is a simple and powerful
paradigm that captures many popular compression schemes. In this paper, we
present a novel grammar representation that allows efficient random access to
any character or substring without decompressing the string.
Let be a string of length compressed into a context-free grammar
of size . We present two representations of
achieving random access time, and either
construction time and space on the pointer machine model, or
construction time and space on the RAM. Here, is the inverse of
the row of Ackermann's function. Our representations also efficiently
support decompression of any substring in : we can decompress any substring
of length in the same complexity as a single random access query and
additional time. Combining these results with fast algorithms for
uncompressed approximate string matching leads to several efficient algorithms
for approximate string matching on grammar-compressed strings without
decompression. For instance, we can find all approximate occurrences of a
pattern with at most errors in time , where is the number of occurrences of in . Finally, we
generalize our results to navigation and other operations on grammar-compressed
ordered trees.
All of the above bounds significantly improve the currently best known
results. To achieve these bounds, we introduce several new techniques and data
structures of independent interest, including a predecessor data structure, two
"biased" weighted ancestor data structures, and a compact representation of
heavy paths in grammars.Comment: Preliminary version in SODA 201
A Static Optimality Transformation with Applications to Planar Point Location
Over the last decade, there have been several data structures that, given a
planar subdivision and a probability distribution over the plane, provide a way
for answering point location queries that is fine-tuned for the distribution.
All these methods suffer from the requirement that the query distribution must
be known in advance.
We present a new data structure for point location queries in planar
triangulations. Our structure is asymptotically as fast as the optimal
structures, but it requires no prior information about the queries. This is a
2D analogue of the jump from Knuth's optimum binary search trees (discovered in
1971) to the splay trees of Sleator and Tarjan in 1985. While the former need
to know the query distribution, the latter are statically optimal. This means
that we can adapt to the query sequence and achieve the same asymptotic
performance as an optimum static structure, without needing any additional
information.Comment: 13 pages, 1 figure, a preliminary version appeared at SoCG 201