2 research outputs found

    Fully-Online Suffix Tree and Directed Acyclic Word Graph Construction for Multiple Texts

    Full text link
    We consider construction of the suffix tree and the directed acyclic word graph (DAWG) indexing data structures for a collection T\mathcal{T} of texts, where a new symbol may be appended to any text in T={T1,,TK}\mathcal{T} = \{T_1, \ldots, T_K\}, at any time. This fully-online scenario, which arises in dynamically indexing multi-sensor data, is a natural generalization of the long solved semi-online text indexing problem, where texts T1,,TkT_1, \ldots, T_{k} are permanently fixed before the next text Tk+1T_{k+1} is processed for each 1k<K1 \leq k < K. We present fully-online algorithms that construct the suffix tree and the DAWG for T\mathcal{T} in O(Nlogσ)O(N \log \sigma) time and O(N)O(N) space, where NN is the total lengths of the strings in T\mathcal{T} and σ\sigma is their alphabet size. The standard explicit representation of the suffix tree leaf edges and some DAWG edges must be relaxed in our fully-online scenario, since too many updates on these edges are required in the worst case. Instead, we provide access to the updated suffix tree leaf edge labels and the DAWG edges to be redirected via auxiliary data structures, in O(logσ)O(\log \sigma) time per added character.Comment: 28 pages, 6 figures, LaTe

    Pointer-Machine Algorithms for Fully-Online Construction of Suffix Trees and DAWGs on Multiple Strings

    Full text link
    We deal with the problem of maintaining the suffix tree indexing structure for a fully-online collection of multiple strings, where a new character can be prepended to any string in the collection at any time. The only previously known algorithm for the problem, recently proposed by Takagi et al. [Algorithmica 82(5): 1346-1377 (2020)], runs in O(Nlogσ)O(N \log \sigma) time and O(N)O(N) space on the word RAM model, where NN denotes the total length of the strings and σ\sigma denotes the alphabet size. Their algorithm makes heavy use of the nearest marked ancestor (NMA) data structure on semi-dynamic trees, that can answer queries and supports insertion of nodes in O(1)O(1) amortized time on the word RAM model. In this paper, we present a simpler fully-online right-to-left algorithm that builds the suffix tree for a given string collection in O(N(logσ+logd))O(N (\log \sigma + \log d)) time and O(N)O(N) space, where dd is the maximum number of in-coming Weiner links to a node of the suffix tree. We note that dd is bounded by the height of the suffix tree, which is further bounded by the length of the longest string in the collection. The advantage of this new algorithm is that it works on the pointer machine model, namely, it does not use the complicated NMA data structures that involve table look-ups. As a byproduct, we also obtain a pointer-machine algorithm for building the directed acyclic word graph (DAWG) for a fully-online left-to-right collection of multiple strings, which runs in O(N(logσ+logd))O(N (\log \sigma + \log d)) time and O(N)O(N) space again without the aid of the NMA data structures
    corecore