2,818 research outputs found

    Succinct Representations of Dynamic Strings

    Full text link
    The rank and select operations over a string of length n from an alphabet of size σ\sigma have been used widely in the design of succinct data structures. In many applications, the string itself need be maintained dynamically, allowing characters of the string to be inserted and deleted. Under the word RAM model with word size w=Ω(lgn)w=\Omega(\lg n), we design a succinct representation of dynamic strings using nH0+o(n)lgσ+O(w)nH_0 + o(n)\lg\sigma + O(w) bits to support rank, select, insert and delete in O(lgnlglgn(lgσlglgn+1))O(\frac{\lg n}{\lg\lg n}(\frac{\lg \sigma}{\lg\lg n}+1)) time. When the alphabet size is small, i.e. when \sigma = O(\polylog (n)), including the case in which the string is a bit vector, these operations are supported in O(lgnlglgn)O(\frac{\lg n}{\lg\lg n}) time. Our data structures are more efficient than previous results on the same problem, and we have applied them to improve results on the design and construction of space-efficient text indexes

    Cross-Document Pattern Matching

    Get PDF
    We study a new variant of the string matching problem called cross-document string matching, which is the problem of indexing a collection of documents to support an efficient search for a pattern in a selected document, where the pattern itself is a substring of another document. Several variants of this problem are considered, and efficient linear-space solutions are proposed with query time bounds that either do not depend at all on the pattern size or depend on it in a very limited way (doubly logarithmic). As a side result, we propose an improved solution to the weighted level ancestor problem

    A Space-Optimal Grammar Compression

    Get PDF
    A grammar compression is a context-free grammar (CFG) deriving a single string deterministically. For an input string of length N over an alphabet of size sigma, the smallest CFG is O(log N)-approximable in the offline setting and O(log N log^* N)-approximable in the online setting. In addition, an information-theoretic lower bound for representing a CFG in Chomsky normal form of n variables is log (n!/n^sigma) + n + o(n) bits. Although there is an online grammar compression algorithm that directly computes the succinct encoding of its output CFG with O(log N log^* N) approximation guarantee, the problem of optimizing its working space has remained open. We propose a fully-online algorithm that requires the fewest bits of working space asymptotically equal to the lower bound in O(N log log n) compression time. In addition we propose several techniques to boost grammar compression and show their efficiency by computational experiments
    corecore