Search CORE

18,740 research outputs found

Random Access to Grammar Compressed Strings

Author: Bille Philip
Landau Gad M.
Raman Rajeev
Sadakane Kunihiko
Satti Srinivasa Rao
Weimann Oren
Publication venue
Publication date: 01/01/2011
Field of study

Grammar based compression, where one replaces a long string by a small context-free grammar that generates the string, is a simple and powerful paradigm that captures many popular compression schemes. In this paper, we present a novel grammar representation that allows efficient random access to any character or substring without decompressing the string. Let

S

be a string of length

N

compressed into a context-free grammar

\mathcal{S}

of size

n

. We present two representations of

\mathcal{S}

achieving

O(\log N)

random access time, and either

O(n\cdot \alpha_k(n))

construction time and space on the pointer machine model, or

O(n)

construction time and space on the RAM. Here,

\alpha_k(n)

is the inverse of the

k^{th}

row of Ackermann's function. Our representations also efficiently support decompression of any substring in

S

: we can decompress any substring of length

m

in the same complexity as a single random access query and additional

O(m)

time. Combining these results with fast algorithms for uncompressed approximate string matching leads to several efficient algorithms for approximate string matching on grammar-compressed strings without decompression. For instance, we can find all approximate occurrences of a pattern

P

with at most

k

errors in time

O(n(\min\{|P|k, k^4 + |P|\} + \log N) + occ)

, where

occ

is the number of occurrences of

P

S

. Finally, we generalize our results to navigation and other operations on grammar-compressed ordered trees. All of the above bounds significantly improve the currently best known results. To achieve these bounds, we introduce several new techniques and data structures of independent interest, including a predecessor data structure, two "biased" weighted ancestor data structures, and a compact representation of heavy paths in grammars.Comment: Preliminary version in SODA 201

arXiv.org e-Print Archive

Crossref

Online Research Database In Technology

Leicester Research Archive

Trees and Markov convexity

Author: Lee James R.
Naor Assaf
Peres Yuval
Publication venue
Publication date: 01/01/2006
Field of study

We show that an infinite weighted tree admits a bi-Lipschitz embedding into Hilbert space if and only if it does not contain arbitrarily large complete binary trees with uniformly bounded distortion. We also introduce a new metric invariant called Markov convexity, and show how it can be used to compute the Euclidean distortion of any metric tree up to universal factors

arXiv.org e-Print Archive

CiteSeerX

Crossref

On weighted depths in random binary search trees

Author: Aguech Rafik
Amri Anis
Sulzbach Henning
Publication venue
Publication date: 01/07/2017
Field of study

Following the model introduced by Aguech, Lasmar and Mahmoud [Probab. Engrg. Inform. Sci. 21 (2007) 133-141], the weighted depth of a node in a labelled rooted tree is the sum of all labels on the path connecting the node to the root. We analyze weighted depths of nodes with given labels, the last inserted node, nodes ordered as visited by the depth first search process, the weighted path length and the weighted Wiener index in a random binary search tree. We establish three regimes of nodes depending on whether the second order behaviour of their weighted depths follows from fluctuations of the keys on the path, the depth of the nodes, or both. Finally, we investigate a random distribution function on the unit interval arising as scaling limit for weighted depths of nodes with at most one child

arXiv.org e-Print Archive

Crossref

University of Birmingham Research Portal

Deterministic and Probabilistic Binary Search in Graphs

Author: Aslam J. A.
Burnashev M. V.
Dhagat A.
Karp R. M.
Laber E. S.
Mozes S.
Nowak R.
Pedrotti A.
Ulam S. M.
Publication venue
Publication date: 28/07/2017
Field of study

We consider the following natural generalization of Binary Search: in a given undirected, positively weighted graph, one vertex is a target. The algorithm's task is to identify the target by adaptively querying vertices. In response to querying a node

q

, the algorithm learns either that

q

is the target, or is given an edge out of

q

that lies on a shortest path from

q

to the target. We study this problem in a general noisy model in which each query independently receives a correct answer with probability

p > \frac{1}{2}

(a known constant), and an (adversarial) incorrect one with probability

1-p

. Our main positive result is that when

p = 1

(i.e., all answers are correct),

\log_2 n

queries are always sufficient. For general

p

, we give an (almost information-theoretically optimal) algorithm that uses, in expectation, no more than

(1 - \delta)\frac{\log_2 n}{1 - H(p)} + o(\log n) + O(\log^2 (1/\delta))

queries, and identifies the target correctly with probability at leas

1-\delta

. Here,

H(p) = -(p \log p + (1-p) \log(1-p))

denotes the entropy. The first bound is achieved by the algorithm that iteratively queries a 1-median of the nodes not ruled out yet; the second bound by careful repeated invocations of a multiplicative weights algorithm. Even for

p = 1

, we show several hardness results for the problem of determining whether a target can be found using

K

queries. Our upper bound of

\log_2 n

implies a quasipolynomial-time algorithm for undirected connected graphs; we show that this is best-possible under the Strong Exponential Time Hypothesis (SETH). Furthermore, for directed graphs, or for undirected graphs with non-uniform node querying costs, the problem is PSPACE-complete. For a semi-adaptive version, in which one may query

r

nodes each in

k

rounds, we show membership in

\Sigma_{2k-1}

in the polynomial hierarchy, and hardness for

\Sigma_{2k-5}

arXiv.org e-Print Archive

Crossref

Local Optimality Certificates for LP Decoding of Tanner Codes

Author: Even Guy
Halabi Nissim
Publication venue
Publication date: 24/04/2011
Field of study

We present a new combinatorial characterization for local optimality of a codeword in an irregular Tanner code. The main novelty in this characterization is that it is based on a linear combination of subtrees in the computation trees. These subtrees may have any degree in the local code nodes and may have any height (even greater than the girth). We expect this new characterization to lead to improvements in bounds for successful decoding. We prove that local optimality in this new characterization implies ML-optimality and LP-optimality, as one would expect. Finally, we show that is possible to compute efficiently a certificate for the local optimality of a codeword given an LLR vector

arXiv.org e-Print Archive

CiteSeerX

Finger Search in Grammar-Compressed Strings

Author: Bille Philip
Christiansen Anders Roy
Cording Patrick Hagge
Gørtz Inge Li
Publication venue
Publication date: 01/01/2016
Field of study

Grammar-based compression, where one replaces a long string by a small context-free grammar that generates the string, is a simple and powerful paradigm that captures many popular compression schemes. Given a grammar, the random access problem is to compactly represent the grammar while supporting random access, that is, given a position in the original uncompressed string report the character at that position. In this paper we study the random access problem with the finger search property, that is, the time for a random access query should depend on the distance between a specified index

f

, called the \emph{finger}, and the query index

i

. We consider both a static variant, where we first place a finger and subsequently access indices near the finger efficiently, and a dynamic variant where also moving the finger such that the time depends on the distance moved is supported. Let

n

be the size the grammar, and let

N

be the size of the string. For the static variant we give a linear space representation that supports placing the finger in

O(\log N)

time and subsequently accessing in

O(\log D)

time, where

D

is the distance between the finger and the accessed index. For the dynamic variant we give a linear space representation that supports placing the finger in

O(\log N)

time and accessing and moving the finger in

O(\log D + \log \log N)

time. Compared to the best linear space solution to random access, we improve a

O(\log N)

query bound to

O(\log D)

for the static variant and to

O(\log D + \log \log N)

for the dynamic variant, while maintaining linear space. As an application of our results we obtain an improved solution to the longest common extension problem in grammar compressed strings. To obtain our results, we introduce several new techniques of independent interest, including a novel van Emde Boas style decomposition of grammars

arXiv.org e-Print Archive

Dagstuhl Research Online Publication Server

Online Research Database In Technology