4 research outputs found
Fully-online Construction of Suffix Trees for Multiple Texts
We consider fully-online construction of indexing data structures for multiple texts. Let T = {T_1, ..., T_K} be a collection of texts. By fully-online, we mean that a new character can be appended to any text in T at any time. This is a natural generalization of semi-online construction of indexing data structures for multiple texts in which, after a new character is appended to the kth text T_k, then its previous texts T_1, ..., T_k-1 will remain static. Our fully-online scenario arises when we maintain dynamic indexes for multi-sensor data. Let N and sigma denote the total length of texts in T and the alphabet size, respectively. We first show that the algorithm by Blumer et al. [Theoretical Computer Science, 40:31-55, 1985] to construct the directed acyclic word graph (DAWG) for T can readily be extended to our fully-online setting, retaining O(N log sigma)-time and O(N)-space complexities. Then, we give a sophisticated fully-online algorithm which constructs the suffix tree for T in O(N log sigma) time and O(N) space. A key idea of this algorithm is synchronized maintenance of the DAWG and the suffix tree
Locally Consistent Parsing for Text Indexing in Small Space
We consider two closely related problems of text indexing in a sub-linear
working space. The first problem is the Sparse Suffix Tree (SST) construction
of a set of suffixes using only words of space. The second problem
is the Longest Common Extension (LCE) problem, where for some parameter
, the goal is to construct a data structure that uses words of space and can compute the longest common prefix length of
any pair of suffixes. We show how to use ideas based on the Locally Consistent
Parsing technique, that was introduced by Sahinalp and Vishkin [STOC '94], in
some non-trivial ways in order to improve the known results for the above
problems. We introduce new Las-Vegas and deterministic algorithms for both
problems.
We introduce the first Las-Vegas SST construction algorithm that takes
time. This is an improvement over the last result of Gawrychowski and Kociumaka
[SODA '17] who obtained time for Monte-Carlo algorithm, and
time for Las-Vegas algorithm. In addition, we introduce a
randomized Las-Vegas construction for an LCE data structure that can be
constructed in linear time and answers queries in time.
For the deterministic algorithms, we introduce an SST construction algorithm
that takes time (for ). This is
the first almost linear time, , deterministic SST
construction algorithm, where all previous algorithms take at least
time. For the LCE problem, we
introduce a data structure that answers LCE queries in
time, with construction time (for ).
This data structure improves both query time and construction time upon the
results of Tanimura et al. [CPM '16].Comment: Extended abstract to appear is SODA 202
Improved Dynamic Text Indexing
In the dynamic text indexing problem, a text string has to be maintained under string insertions and deletions in order to answer on-line queries about arbitrary pattern occurrences. By means of some new techniques and data structures, we achieve improved worst-case bounds. We show that finding all pocc occurrences of a pattern of length p in the current text of length n takes O(p + pocc + upd log p + log n) time, where upd is the number of text updates performed so far; inserting or deleting a string of length s from the current text takes O(s log(s + n)) time. 1 Introduction String matching involves the detection of all the occurrences of a pattern string P [1; p] in the form of substrings of a longer text string T [1; n], where the string characters are taken from a given alphabet. (In this paper, we assume that the alphabet is ordered and bounded.) If the pattern and the text are given together, a number of optimal solutions are known to require \Theta(p +n) time to perform this t..