3,714 research outputs found

    Compressed Subsequence Matching and Packed Tree Coloring

    Get PDF
    We present a new algorithm for subsequence matching in grammar compressed strings. Given a grammar of size nn compressing a string of size NN and a pattern string of size mm over an alphabet of size Ļƒ\sigma, our algorithm uses O(n+nĻƒw)O(n+\frac{n\sigma}{w}) space and O(n+nĻƒw+mlogā”Nlogā”wā‹…occ)O(n+\frac{n\sigma}{w}+m\log N\log w\cdot occ) or O(n+nĻƒwlogā”w+mlogā”Nā‹…occ)O(n+\frac{n\sigma}{w}\log w+m\log N\cdot occ) time. Here ww is the word size and occocc is the number of occurrences of the pattern. Our algorithm uses less space than previous algorithms and is also faster for occ=o(nlogā”N)occ=o(\frac{n}{\log N}) occurrences. The algorithm uses a new data structure that allows us to efficiently find the next occurrence of a given character after a given position in a compressed string. This data structure in turn is based on a new data structure for the tree color problem, where the node colors are packed in bit strings.Comment: To appear at CPM '1

    Rank, select and access in grammar-compressed strings

    Full text link
    Given a string SS of length NN on a fixed alphabet of Ļƒ\sigma symbols, a grammar compressor produces a context-free grammar GG of size nn that generates SS and only SS. In this paper we describe data structures to support the following operations on a grammar-compressed string: \mbox{rank}_c(S,i) (return the number of occurrences of symbol cc before position ii in SS); \mbox{select}_c(S,i) (return the position of the iith occurrence of cc in SS); and \mbox{access}(S,i,j) (return substring S[i,j]S[i,j]). For rank and select we describe data structures of size O(nĻƒlogā”N)O(n\sigma\log N) bits that support the two operations in O(logā”N)O(\log N) time. We propose another structure that uses O(nĻƒlogā”(N/n)(logā”N)1+Ļµ)O(n\sigma\log (N/n)(\log N)^{1+\epsilon}) bits and that supports the two queries in O(logā”N/logā”logā”N)O(\log N/\log\log N), where Ļµ>0\epsilon>0 is an arbitrary constant. To our knowledge, we are the first to study the asymptotic complexity of rank and select in the grammar-compressed setting, and we provide a hardness result showing that significantly improving the bounds we achieve would imply a major breakthrough on a hard graph-theoretical problem. Our main result for access is a method that requires O(nlogā”N)O(n\log N) bits of space and O(logā”N+m/logā”ĻƒN)O(\log N+m/\log_\sigma N) time to extract m=jāˆ’i+1m=j-i+1 consecutive symbols from SS. Alternatively, we can achieve O(logā”N/logā”logā”N+m/logā”ĻƒN)O(\log N/\log\log N+m/\log_\sigma N) query time using O(nlogā”(N/n)(logā”N)1+Ļµ)O(n\log (N/n)(\log N)^{1+\epsilon}) bits of space. This matches a lower bound stated by Verbin and Yu for strings where NN is polynomially related to nn.Comment: 16 page

    Linear Compressed Pattern Matching for Polynomial Rewriting (Extended Abstract)

    Full text link
    This paper is an extended abstract of an analysis of term rewriting where the terms in the rewrite rules as well as the term to be rewritten are compressed by a singleton tree grammar (STG). This form of compression is more general than node sharing or representing terms as dags since also partial trees (contexts) can be shared in the compression. In the first part efficient but complex algorithms for detecting applicability of a rewrite rule under STG-compression are constructed and analyzed. The second part applies these results to term rewriting sequences. The main result for submatching is that finding a redex of a left-linear rule can be performed in polynomial time under STG-compression. The main implications for rewriting and (single-position or parallel) rewriting steps are: (i) under STG-compression, n rewriting steps can be performed in nondeterministic polynomial time. (ii) under STG-compression and for left-linear rewrite rules a sequence of n rewriting steps can be performed in polynomial time, and (iii) for compressed rewrite rules where the left hand sides are either DAG-compressed or ground and STG-compressed, and an STG-compressed target term, n rewriting steps can be performed in polynomial time.Comment: In Proceedings TERMGRAPH 2013, arXiv:1302.599

    Pattern matching of compressed terms and contexts and polynomial rewriting

    Get PDF
    A generalization of the compressed string pattern match that applies to terms with variables is investigated: Given terms s and t compressed by singleton tree grammars, the task is to find an instance of s that occurs as a subterm in t. We show that this problem is in NP and that the task can be performed in time O(ncjVar(s)j), including the construction of the compressed substitution, and a representation of all occurrences. We show that the special case where s is uncompressed can be performed in polynomial time. As a nice application we show that for an equational deduction of t to t0 by an equality axiom l = r (a rewrite) a single step can be performed in polynomial time in the size of compression of t and l; r if the number of variables is fixed in l. We also show that n rewriting steps can be performed in polynomial time, if the equational axioms are compressed and assumed to be constant for the rewriting sequence. Another potential application are querying mechanisms on compressed XML-data bases

    Compressed Membership for NFA (DFA) with Compressed Labels is in NP (P)

    Get PDF
    In this paper, a compressed membership problem for finite automata, both deterministic and non-deterministic, with compressed transition labels is studied. The compression is represented by straight-line programs (SLPs), i.e. context-free grammars generating exactly one string. A novel technique of dealing with SLPs is introduced: the SLPs are recompressed, so that substrings of the input text are encoded in SLPs labelling the transitions of the NFA (DFA) in the same way, as in the SLP representing the input text. To this end, the SLPs are locally decompressed and then recompressed in a uniform way. Furthermore, such recompression induces only small changes in the automaton, in particular, the size of the automaton remains polynomial. Using this technique it is shown that the compressed membership for NFA with compressed labels is in NP, thus confirming the conjecture of Plandowski and Rytter and extending the partial result of Lohrey and Mathissen; as it is already known, that this problem is NP-hard, we settle its exact computational complexity. Moreover, the same technique applied to the compressed membership for DFA with compressed labels yields that this problem is in P; for this problem, only trivial upper-bound PSPACE was known

    Efficient LZ78 factorization of grammar compressed text

    Full text link
    We present an efficient algorithm for computing the LZ78 factorization of a text, where the text is represented as a straight line program (SLP), which is a context free grammar in the Chomsky normal form that generates a single string. Given an SLP of size nn representing a text SS of length NN, our algorithm computes the LZ78 factorization of TT in O(nN+mlogā”N)O(n\sqrt{N}+m\log N) time and O(nN+m)O(n\sqrt{N}+m) space, where mm is the number of resulting LZ78 factors. We also show how to improve the algorithm so that the nNn\sqrt{N} term in the time and space complexities becomes either nLnL, where LL is the length of the longest LZ78 factor, or (Nāˆ’Ī±)(N - \alpha) where Ī±ā‰„0\alpha \geq 0 is a quantity which depends on the amount of redundancy that the SLP captures with respect to substrings of SS of a certain length. Since m=O(N/logā”ĻƒN)m = O(N/\log_\sigma N) where Ļƒ\sigma is the alphabet size, the latter is asymptotically at least as fast as a linear time algorithm which runs on the uncompressed string when Ļƒ\sigma is constant, and can be more efficient when the text is compressible, i.e. when mm and nn are small.Comment: SPIRE 201
    • ā€¦
    corecore