18,740 research outputs found
Random Access to Grammar Compressed Strings
Grammar based compression, where one replaces a long string by a small
context-free grammar that generates the string, is a simple and powerful
paradigm that captures many popular compression schemes. In this paper, we
present a novel grammar representation that allows efficient random access to
any character or substring without decompressing the string.
Let be a string of length compressed into a context-free grammar
of size . We present two representations of
achieving random access time, and either
construction time and space on the pointer machine model, or
construction time and space on the RAM. Here, is the inverse of
the row of Ackermann's function. Our representations also efficiently
support decompression of any substring in : we can decompress any substring
of length in the same complexity as a single random access query and
additional time. Combining these results with fast algorithms for
uncompressed approximate string matching leads to several efficient algorithms
for approximate string matching on grammar-compressed strings without
decompression. For instance, we can find all approximate occurrences of a
pattern with at most errors in time , where is the number of occurrences of in . Finally, we
generalize our results to navigation and other operations on grammar-compressed
ordered trees.
All of the above bounds significantly improve the currently best known
results. To achieve these bounds, we introduce several new techniques and data
structures of independent interest, including a predecessor data structure, two
"biased" weighted ancestor data structures, and a compact representation of
heavy paths in grammars.Comment: Preliminary version in SODA 201
Trees and Markov convexity
We show that an infinite weighted tree admits a bi-Lipschitz embedding into
Hilbert space if and only if it does not contain arbitrarily large complete
binary trees with uniformly bounded distortion. We also introduce a new metric
invariant called Markov convexity, and show how it can be used to compute the
Euclidean distortion of any metric tree up to universal factors
On weighted depths in random binary search trees
Following the model introduced by Aguech, Lasmar and Mahmoud [Probab. Engrg.
Inform. Sci. 21 (2007) 133-141], the weighted depth of a node in a labelled
rooted tree is the sum of all labels on the path connecting the node to the
root. We analyze weighted depths of nodes with given labels, the last inserted
node, nodes ordered as visited by the depth first search process, the weighted
path length and the weighted Wiener index in a random binary search tree. We
establish three regimes of nodes depending on whether the second order
behaviour of their weighted depths follows from fluctuations of the keys on the
path, the depth of the nodes, or both. Finally, we investigate a random
distribution function on the unit interval arising as scaling limit for
weighted depths of nodes with at most one child
Deterministic and Probabilistic Binary Search in Graphs
We consider the following natural generalization of Binary Search: in a given
undirected, positively weighted graph, one vertex is a target. The algorithm's
task is to identify the target by adaptively querying vertices. In response to
querying a node , the algorithm learns either that is the target, or is
given an edge out of that lies on a shortest path from to the target.
We study this problem in a general noisy model in which each query
independently receives a correct answer with probability (a
known constant), and an (adversarial) incorrect one with probability .
Our main positive result is that when (i.e., all answers are
correct), queries are always sufficient. For general , we give an
(almost information-theoretically optimal) algorithm that uses, in expectation,
no more than queries, and identifies the target correctly with probability at
leas . Here, denotes the
entropy. The first bound is achieved by the algorithm that iteratively queries
a 1-median of the nodes not ruled out yet; the second bound by careful repeated
invocations of a multiplicative weights algorithm.
Even for , we show several hardness results for the problem of
determining whether a target can be found using queries. Our upper bound of
implies a quasipolynomial-time algorithm for undirected connected
graphs; we show that this is best-possible under the Strong Exponential Time
Hypothesis (SETH). Furthermore, for directed graphs, or for undirected graphs
with non-uniform node querying costs, the problem is PSPACE-complete. For a
semi-adaptive version, in which one may query nodes each in rounds, we
show membership in in the polynomial hierarchy, and hardness
for
Local Optimality Certificates for LP Decoding of Tanner Codes
We present a new combinatorial characterization for local optimality of a
codeword in an irregular Tanner code. The main novelty in this characterization
is that it is based on a linear combination of subtrees in the computation
trees. These subtrees may have any degree in the local code nodes and may have
any height (even greater than the girth). We expect this new characterization
to lead to improvements in bounds for successful decoding.
We prove that local optimality in this new characterization implies
ML-optimality and LP-optimality, as one would expect. Finally, we show that is
possible to compute efficiently a certificate for the local optimality of a
codeword given an LLR vector
Finger Search in Grammar-Compressed Strings
Grammar-based compression, where one replaces a long string by a small
context-free grammar that generates the string, is a simple and powerful
paradigm that captures many popular compression schemes. Given a grammar, the
random access problem is to compactly represent the grammar while supporting
random access, that is, given a position in the original uncompressed string
report the character at that position. In this paper we study the random access
problem with the finger search property, that is, the time for a random access
query should depend on the distance between a specified index , called the
\emph{finger}, and the query index . We consider both a static variant,
where we first place a finger and subsequently access indices near the finger
efficiently, and a dynamic variant where also moving the finger such that the
time depends on the distance moved is supported.
Let be the size the grammar, and let be the size of the string. For
the static variant we give a linear space representation that supports placing
the finger in time and subsequently accessing in time,
where is the distance between the finger and the accessed index. For the
dynamic variant we give a linear space representation that supports placing the
finger in time and accessing and moving the finger in time. Compared to the best linear space solution to random
access, we improve a query bound to for the static
variant and to for the dynamic variant, while
maintaining linear space. As an application of our results we obtain an
improved solution to the longest common extension problem in grammar compressed
strings. To obtain our results, we introduce several new techniques of
independent interest, including a novel van Emde Boas style decomposition of
grammars
- …