2 research outputs found
Fully-Online Suffix Tree and Directed Acyclic Word Graph Construction for Multiple Texts
We consider construction of the suffix tree and the directed acyclic word
graph (DAWG) indexing data structures for a collection of texts,
where a new symbol may be appended to any text in , at any time. This fully-online scenario, which arises in dynamically
indexing multi-sensor data, is a natural generalization of the long solved
semi-online text indexing problem, where texts are
permanently fixed before the next text is processed for each . We present fully-online algorithms that construct the suffix tree and
the DAWG for in time and space, where
is the total lengths of the strings in and is their
alphabet size. The standard explicit representation of the suffix tree leaf
edges and some DAWG edges must be relaxed in our fully-online scenario, since
too many updates on these edges are required in the worst case. Instead, we
provide access to the updated suffix tree leaf edge labels and the DAWG edges
to be redirected via auxiliary data structures, in time per
added character.Comment: 28 pages, 6 figures, LaTe
Pointer-Machine Algorithms for Fully-Online Construction of Suffix Trees and DAWGs on Multiple Strings
We deal with the problem of maintaining the suffix tree indexing structure
for a fully-online collection of multiple strings, where a new character can be
prepended to any string in the collection at any time. The only previously
known algorithm for the problem, recently proposed by Takagi et al.
[Algorithmica 82(5): 1346-1377 (2020)], runs in time and
space on the word RAM model, where denotes the total length of the
strings and denotes the alphabet size. Their algorithm makes heavy use
of the nearest marked ancestor (NMA) data structure on semi-dynamic trees, that
can answer queries and supports insertion of nodes in amortized time on
the word RAM model. In this paper, we present a simpler fully-online
right-to-left algorithm that builds the suffix tree for a given string
collection in time and space, where is
the maximum number of in-coming Weiner links to a node of the suffix tree. We
note that is bounded by the height of the suffix tree, which is further
bounded by the length of the longest string in the collection. The advantage of
this new algorithm is that it works on the pointer machine model, namely, it
does not use the complicated NMA data structures that involve table look-ups.
As a byproduct, we also obtain a pointer-machine algorithm for building the
directed acyclic word graph (DAWG) for a fully-online left-to-right collection
of multiple strings, which runs in time and
space again without the aid of the NMA data structures