We present an improved wavelet tree construction algorithm and discuss its
applications to a number of rank/select problems for integer keys and strings.
Given a string of length n over an alphabet of size σ≤n, our
method builds the wavelet tree in O(nlogσ/logn) time,
improving upon the state-of-the-art algorithm by a factor of logn.
As a consequence, given an array of n integers we can construct in O(nlogn) time a data structure consisting of O(n) machine words and
capable of answering rank/select queries for the subranges of the array in
O(logn/loglogn) time. This is a loglogn-factor improvement in
query time compared to Chan and P\u{a}tra\c{s}cu and a logn-factor
improvement in construction time compared to Brodal et al.
Next, we switch to stringological context and propose a novel notion of
wavelet suffix trees. For a string w of length n, this data structure occupies
O(n) words, takes O(nlogn) time to construct, and simultaneously
captures the combinatorial structure of substrings of w while enabling
efficient top-down traversal and binary search. In particular, with a wavelet
suffix tree we are able to answer in O(log∣x∣) time the following two
natural analogues of rank/select queries for suffixes of substrings: for
substrings x and y of w count the number of suffixes of x that are
lexicographically smaller than y, and for a substring x of w and an integer k,
find the k-th lexicographically smallest suffix of x.
We further show that wavelet suffix trees allow to compute a
run-length-encoded Burrows-Wheeler transform of a substring x of w in O(slog∣x∣) time, where s denotes the length of the resulting run-length encoding.
This answers a question by Cormode and Muthukrishnan, who considered an
analogous problem for Lempel-Ziv compression.Comment: 33 pages, 5 figures; preliminary version published at SODA 201