2,818 research outputs found
Succinct Representations of Dynamic Strings
The rank and select operations over a string of length n from an alphabet of
size have been used widely in the design of succinct data structures.
In many applications, the string itself need be maintained dynamically,
allowing characters of the string to be inserted and deleted. Under the word
RAM model with word size , we design a succinct representation
of dynamic strings using bits to support rank,
select, insert and delete in time. When the alphabet size is small, i.e. when \sigma = O(\polylog
(n)), including the case in which the string is a bit vector, these operations
are supported in time. Our data structures are more
efficient than previous results on the same problem, and we have applied them
to improve results on the design and construction of space-efficient text
indexes
Cross-Document Pattern Matching
We study a new variant of the string matching problem called cross-document
string matching, which is the problem of indexing a collection of documents to
support an efficient search for a pattern in a selected document, where the
pattern itself is a substring of another document. Several variants of this
problem are considered, and efficient linear-space solutions are proposed with
query time bounds that either do not depend at all on the pattern size or
depend on it in a very limited way (doubly logarithmic). As a side result, we
propose an improved solution to the weighted level ancestor problem
A Space-Optimal Grammar Compression
A grammar compression is a context-free grammar (CFG) deriving a single string deterministically. For an input string of length N over an alphabet of size sigma, the smallest CFG is O(log N)-approximable in the offline setting and O(log N log^* N)-approximable in the online setting. In addition, an information-theoretic lower bound for representing a CFG in Chomsky normal form of n variables is log (n!/n^sigma) + n + o(n) bits. Although there is an online grammar compression algorithm that directly computes the succinct encoding of its output CFG with O(log N log^* N) approximation guarantee, the problem of optimizing its working space has remained open. We propose a fully-online algorithm that requires the fewest bits of working space asymptotically equal to the lower bound in O(N log log n) compression time. In addition we propose several techniques to boost grammar compression and show their efficiency by computational experiments
- …