17,410 research outputs found
Parallel Wavelet Tree Construction
We present parallel algorithms for wavelet tree construction with
polylogarithmic depth, improving upon the linear depth of the recent parallel
algorithms by Fuentes-Sepulveda et al. We experimentally show on a 40-core
machine with two-way hyper-threading that we outperform the existing parallel
algorithms by 1.3--5.6x and achieve up to 27x speedup over the sequential
algorithm on a variety of real-world and artificial inputs. Our algorithms show
good scalability with increasing thread count, input size and alphabet size. We
also discuss extensions to variants of the standard wavelet tree.Comment: This is a longer version of the paper that appears in the Proceedings
of the IEEE Data Compression Conference, 201
Detecting Activations over Graphs using Spanning Tree Wavelet Bases
We consider the detection of activations over graphs under Gaussian noise,
where signals are piece-wise constant over the graph. Despite the wide
applicability of such a detection algorithm, there has been little success in
the development of computationally feasible methods with proveable theoretical
guarantees for general graph topologies. We cast this as a hypothesis testing
problem, and first provide a universal necessary condition for asymptotic
distinguishability of the null and alternative hypotheses. We then introduce
the spanning tree wavelet basis over graphs, a localized basis that reflects
the topology of the graph, and prove that for any spanning tree, this approach
can distinguish null from alternative in a low signal-to-noise regime. Lastly,
we improve on this result and show that using the uniform spanning tree in the
basis construction yields a randomized test with stronger theoretical
guarantees that in many cases matches our necessary conditions. Specifically,
we obtain near-optimal performance in edge transitive graphs, -nearest
neighbor graphs, and -graphs
Wavelet Trees Meet Suffix Trees
We present an improved wavelet tree construction algorithm and discuss its
applications to a number of rank/select problems for integer keys and strings.
Given a string of length n over an alphabet of size , our
method builds the wavelet tree in time,
improving upon the state-of-the-art algorithm by a factor of .
As a consequence, given an array of n integers we can construct in time a data structure consisting of machine words and
capable of answering rank/select queries for the subranges of the array in
time. This is a -factor improvement in
query time compared to Chan and P\u{a}tra\c{s}cu and a -factor
improvement in construction time compared to Brodal et al.
Next, we switch to stringological context and propose a novel notion of
wavelet suffix trees. For a string w of length n, this data structure occupies
words, takes time to construct, and simultaneously
captures the combinatorial structure of substrings of w while enabling
efficient top-down traversal and binary search. In particular, with a wavelet
suffix tree we are able to answer in time the following two
natural analogues of rank/select queries for suffixes of substrings: for
substrings x and y of w count the number of suffixes of x that are
lexicographically smaller than y, and for a substring x of w and an integer k,
find the k-th lexicographically smallest suffix of x.
We further show that wavelet suffix trees allow to compute a
run-length-encoded Burrows-Wheeler transform of a substring x of w in time, where s denotes the length of the resulting run-length encoding.
This answers a question by Cormode and Muthukrishnan, who considered an
analogous problem for Lempel-Ziv compression.Comment: 33 pages, 5 figures; preliminary version published at SODA 201
Parallel Construction of Wavelet Trees on Multicore Architectures
The wavelet tree has become a very useful data structure to efficiently
represent and query large volumes of data in many different domains, from
bioinformatics to geographic information systems. One problem with wavelet
trees is their construction time. In this paper, we introduce two algorithms
that reduce the time complexity of a wavelet tree's construction by taking
advantage of nowadays ubiquitous multicore machines.
Our first algorithm constructs all the levels of the wavelet in parallel in
time and bits of working space, where
is the size of the input sequence and is the size of the alphabet. Our
second algorithm constructs the wavelet tree in a domain-decomposition fashion,
using our first algorithm in each segment, reaching time and
bits of extra space, where is the
number of available cores. Both algorithms are practical and report good
speedup for large real datasets.Comment: This research has received funding from the European Union's Horizon
2020 research and innovation programme under the Marie Sk{\l}odowska-Curie
Actions H2020-MSCA-RISE-2015 BIRDS GA No. 69094
The Wavelet Trie: Maintaining an Indexed Sequence of Strings in Compressed Space
An indexed sequence of strings is a data structure for storing a string
sequence that supports random access, searching, range counting and analytics
operations, both for exact matches and prefix search. String sequences lie at
the core of column-oriented databases, log processing, and other storage and
query tasks. In these applications each string can appear several times and the
order of the strings in the sequence is relevant. The prefix structure of the
strings is relevant as well: common prefixes are sought in strings to extract
interesting features from the sequence. Moreover, space-efficiency is highly
desirable as it translates directly into higher performance, since more data
can fit in fast memory.
We introduce and study the problem of compressed indexed sequence of strings,
representing indexed sequences of strings in nearly-optimal compressed space,
both in the static and dynamic settings, while preserving provably good
performance for the supported operations.
We present a new data structure for this problem, the Wavelet Trie, which
combines the classical Patricia Trie with the Wavelet Tree, a succinct data
structure for storing a compressed sequence. The resulting Wavelet Trie
smoothly adapts to a sequence of strings that changes over time. It improves on
the state-of-the-art compressed data structures by supporting a dynamic
alphabet (i.e. the set of distinct strings) and prefix queries, both crucial
requirements in the aforementioned applications, and on traditional indexes by
reducing space occupancy to close to the entropy of the sequence
Simulation of Gegenbauer Processes using Wavelet Packets
In this paper, we study the synthesis of Gegenbauer processes using the
wavelet packets transform. In order to simulate a 1-factor Gegenbauer process,
we introduce an original algorithm, inspired by the one proposed by Coifman and
Wickerhauser [1], to adaptively search for the best-ortho-basis in the wavelet
packet library where the covariance matrix of the transformed process is nearly
diagonal. Our method clearly outperforms the one recently proposed by [2], is
very fast, does not depend on the wavelet choice, and is not very sensitive to
the length of the time series. From these first results we propose an algorithm
to build bases to simulate k-factor Gegenbauer processes. Given its practical
simplicity, we feel the general practitioner will be attracted to our
simulator. Finally we evaluate the approximation due to the fact that we
consider the wavelet packet coefficients as uncorrelated. An empirical study is
carried out which supports our results
Tensor network and (-adic) AdS/CFT
We use the tensor network living on the Bruhat-Tits tree to give a concrete
realization of the recently proposed -adic AdS/CFT correspondence (a
holographic duality based on the -adic number field ). Instead
of assuming the -adic AdS/CFT correspondence, we show how important features
of AdS/CFT such as the bulk operator reconstruction and the holographic
computation of boundary correlators are automatically implemented in this
tensor network.Comment: 59 pages, 18 figures; v3: improved presentation, added figures and
reference
- …