6 research outputs found
Fully-Functional Bidirectional Burrows-Wheeler Indexes and Infinite-Order De Bruijn Graphs
Given a string T on an alphabet of size sigma, we describe a bidirectional Burrows-Wheeler index that takes O(|T| log sigma) bits of space, and that supports the addition and removal of one character, on the left or right side of any substring of T, in constant time. Previously known data structures that used the same space allowed constant-time addition to any substring of T, but they could support removal only from specific substrings of T. We also describe an index that supports bidirectional addition and removal in O(log log |T|) time, and that takes a number of words proportional to the number of left and right extensions of the maximal repeats of T. We use such fully-functional indexes to implement bidirectional, frequency-aware, variable-order de Bruijn graphs with no upper bound on their order, and supporting natural criteria for increasing and decreasing the order during traversal
Linear-time Computation of DAWGs, Symmetric Indexing Structures, and MAWs for Integer Alphabets
The directed acyclic word graph (DAWG) of a string of length is the
smallest (partial) DFA which recognizes all suffixes of with only
nodes and edges. In this paper, we show how to construct the DAWG for the input
string from the suffix tree for , in time for integer alphabets
of polynomial size in . In so doing, we first describe a folklore algorithm
which, given the suffix tree for , constructs the DAWG for the reversed
string of in time. Then, we present our algorithm that builds the
DAWG for in time for integer alphabets, from the suffix tree for
. We also show that a straightforward modification to our DAWG construction
algorithm leads to the first -time algorithm for constructing the affix
tree of a given string over an integer alphabet. Affix trees are a text
indexing structure supporting bidirectional pattern searches. We then discuss
how our constructions can lead to linear-time algorithms for building other
text indexing structures, such as linear-size suffix tries and symmetric CDAWGs
in linear time in the case of integer alphabets. As a further application to
our -time DAWG construction algorithm, we show that the set
of all minimal absent words (MAWs) of can be computed in
optimal, input- and output-sensitive time and
working space for integer alphabets.Comment: This is an extended version of the paper "Computing DAWGs and Minimal
Absent Words in Linear Time for Integer Alphabets" from MFCS 201
Space efficient merging of de Bruijn graphs and Wheeler graphs
The merging of succinct data structures is a well established technique for
the space efficient construction of large succinct indexes. In the first part
of the paper we propose a new algorithm for merging succinct representations of
de Bruijn graphs. Our algorithm has the same asymptotic cost of the state of
the art algorithm for the same problem but it uses less than half of its
working space. A novel important feature of our algorithm, not found in any of
the existing tools, is that it can compute the Variable Order succinct
representation of the union graph within the same asymptotic time/space bounds.
In the second part of the paper we consider the more general problem of merging
succinct representations of Wheeler graphs, a recently introduced graph family
which includes as special cases de Bruijn graphs and many other known succinct
indexes based on the BWT or one of its variants. We show that Wheeler graphs
merging is in general a much more difficult problem, and we provide a space
efficient algorithm for the slightly simplified problem of determining whether
the union graph has an ordering that satisfies the Wheeler conditions.Comment: 24 pages, 10 figures. arXiv admin note: text overlap with
arXiv:1902.0288