18,740 research outputs found

    Random Access to Grammar Compressed Strings

    Full text link
    Grammar based compression, where one replaces a long string by a small context-free grammar that generates the string, is a simple and powerful paradigm that captures many popular compression schemes. In this paper, we present a novel grammar representation that allows efficient random access to any character or substring without decompressing the string. Let SS be a string of length NN compressed into a context-free grammar S\mathcal{S} of size nn. We present two representations of S\mathcal{S} achieving O(logN)O(\log N) random access time, and either O(nαk(n))O(n\cdot \alpha_k(n)) construction time and space on the pointer machine model, or O(n)O(n) construction time and space on the RAM. Here, αk(n)\alpha_k(n) is the inverse of the kthk^{th} row of Ackermann's function. Our representations also efficiently support decompression of any substring in SS: we can decompress any substring of length mm in the same complexity as a single random access query and additional O(m)O(m) time. Combining these results with fast algorithms for uncompressed approximate string matching leads to several efficient algorithms for approximate string matching on grammar-compressed strings without decompression. For instance, we can find all approximate occurrences of a pattern PP with at most kk errors in time O(n(min{Pk,k4+P}+logN)+occ)O(n(\min\{|P|k, k^4 + |P|\} + \log N) + occ), where occocc is the number of occurrences of PP in SS. Finally, we generalize our results to navigation and other operations on grammar-compressed ordered trees. All of the above bounds significantly improve the currently best known results. To achieve these bounds, we introduce several new techniques and data structures of independent interest, including a predecessor data structure, two "biased" weighted ancestor data structures, and a compact representation of heavy paths in grammars.Comment: Preliminary version in SODA 201

    Trees and Markov convexity

    Full text link
    We show that an infinite weighted tree admits a bi-Lipschitz embedding into Hilbert space if and only if it does not contain arbitrarily large complete binary trees with uniformly bounded distortion. We also introduce a new metric invariant called Markov convexity, and show how it can be used to compute the Euclidean distortion of any metric tree up to universal factors

    On weighted depths in random binary search trees

    Get PDF
    Following the model introduced by Aguech, Lasmar and Mahmoud [Probab. Engrg. Inform. Sci. 21 (2007) 133-141], the weighted depth of a node in a labelled rooted tree is the sum of all labels on the path connecting the node to the root. We analyze weighted depths of nodes with given labels, the last inserted node, nodes ordered as visited by the depth first search process, the weighted path length and the weighted Wiener index in a random binary search tree. We establish three regimes of nodes depending on whether the second order behaviour of their weighted depths follows from fluctuations of the keys on the path, the depth of the nodes, or both. Finally, we investigate a random distribution function on the unit interval arising as scaling limit for weighted depths of nodes with at most one child

    Deterministic and Probabilistic Binary Search in Graphs

    Full text link
    We consider the following natural generalization of Binary Search: in a given undirected, positively weighted graph, one vertex is a target. The algorithm's task is to identify the target by adaptively querying vertices. In response to querying a node qq, the algorithm learns either that qq is the target, or is given an edge out of qq that lies on a shortest path from qq to the target. We study this problem in a general noisy model in which each query independently receives a correct answer with probability p>12p > \frac{1}{2} (a known constant), and an (adversarial) incorrect one with probability 1p1-p. Our main positive result is that when p=1p = 1 (i.e., all answers are correct), log2n\log_2 n queries are always sufficient. For general pp, we give an (almost information-theoretically optimal) algorithm that uses, in expectation, no more than (1δ)log2n1H(p)+o(logn)+O(log2(1/δ))(1 - \delta)\frac{\log_2 n}{1 - H(p)} + o(\log n) + O(\log^2 (1/\delta)) queries, and identifies the target correctly with probability at leas 1δ1-\delta. Here, H(p)=(plogp+(1p)log(1p))H(p) = -(p \log p + (1-p) \log(1-p)) denotes the entropy. The first bound is achieved by the algorithm that iteratively queries a 1-median of the nodes not ruled out yet; the second bound by careful repeated invocations of a multiplicative weights algorithm. Even for p=1p = 1, we show several hardness results for the problem of determining whether a target can be found using KK queries. Our upper bound of log2n\log_2 n implies a quasipolynomial-time algorithm for undirected connected graphs; we show that this is best-possible under the Strong Exponential Time Hypothesis (SETH). Furthermore, for directed graphs, or for undirected graphs with non-uniform node querying costs, the problem is PSPACE-complete. For a semi-adaptive version, in which one may query rr nodes each in kk rounds, we show membership in Σ2k1\Sigma_{2k-1} in the polynomial hierarchy, and hardness for Σ2k5\Sigma_{2k-5}

    Local Optimality Certificates for LP Decoding of Tanner Codes

    Full text link
    We present a new combinatorial characterization for local optimality of a codeword in an irregular Tanner code. The main novelty in this characterization is that it is based on a linear combination of subtrees in the computation trees. These subtrees may have any degree in the local code nodes and may have any height (even greater than the girth). We expect this new characterization to lead to improvements in bounds for successful decoding. We prove that local optimality in this new characterization implies ML-optimality and LP-optimality, as one would expect. Finally, we show that is possible to compute efficiently a certificate for the local optimality of a codeword given an LLR vector

    Finger Search in Grammar-Compressed Strings

    Get PDF
    Grammar-based compression, where one replaces a long string by a small context-free grammar that generates the string, is a simple and powerful paradigm that captures many popular compression schemes. Given a grammar, the random access problem is to compactly represent the grammar while supporting random access, that is, given a position in the original uncompressed string report the character at that position. In this paper we study the random access problem with the finger search property, that is, the time for a random access query should depend on the distance between a specified index ff, called the \emph{finger}, and the query index ii. We consider both a static variant, where we first place a finger and subsequently access indices near the finger efficiently, and a dynamic variant where also moving the finger such that the time depends on the distance moved is supported. Let nn be the size the grammar, and let NN be the size of the string. For the static variant we give a linear space representation that supports placing the finger in O(logN)O(\log N) time and subsequently accessing in O(logD)O(\log D) time, where DD is the distance between the finger and the accessed index. For the dynamic variant we give a linear space representation that supports placing the finger in O(logN)O(\log N) time and accessing and moving the finger in O(logD+loglogN)O(\log D + \log \log N) time. Compared to the best linear space solution to random access, we improve a O(logN)O(\log N) query bound to O(logD)O(\log D) for the static variant and to O(logD+loglogN)O(\log D + \log \log N) for the dynamic variant, while maintaining linear space. As an application of our results we obtain an improved solution to the longest common extension problem in grammar compressed strings. To obtain our results, we introduce several new techniques of independent interest, including a novel van Emde Boas style decomposition of grammars
    corecore