11,134 research outputs found
Parallel Wavelet Tree Construction
We present parallel algorithms for wavelet tree construction with
polylogarithmic depth, improving upon the linear depth of the recent parallel
algorithms by Fuentes-Sepulveda et al. We experimentally show on a 40-core
machine with two-way hyper-threading that we outperform the existing parallel
algorithms by 1.3--5.6x and achieve up to 27x speedup over the sequential
algorithm on a variety of real-world and artificial inputs. Our algorithms show
good scalability with increasing thread count, input size and alphabet size. We
also discuss extensions to variants of the standard wavelet tree.Comment: This is a longer version of the paper that appears in the Proceedings
of the IEEE Data Compression Conference, 201
Parallel Construction of Wavelet Trees on Multicore Architectures
The wavelet tree has become a very useful data structure to efficiently
represent and query large volumes of data in many different domains, from
bioinformatics to geographic information systems. One problem with wavelet
trees is their construction time. In this paper, we introduce two algorithms
that reduce the time complexity of a wavelet tree's construction by taking
advantage of nowadays ubiquitous multicore machines.
Our first algorithm constructs all the levels of the wavelet in parallel in
time and bits of working space, where
is the size of the input sequence and is the size of the alphabet. Our
second algorithm constructs the wavelet tree in a domain-decomposition fashion,
using our first algorithm in each segment, reaching time and
bits of extra space, where is the
number of available cores. Both algorithms are practical and report good
speedup for large real datasets.Comment: This research has received funding from the European Union's Horizon
2020 research and innovation programme under the Marie Sk{\l}odowska-Curie
Actions H2020-MSCA-RISE-2015 BIRDS GA No. 69094
Faster Block Tree Construction
The block tree [Belazzougui et al. J. Comput. Syst. Sci. \u2721] is a compressed text index that can answer access (extract a character at a position), rank (number of occurrences of a specified character in a prefix of the text), and select (size of smallest prefix such that a specified character has a specified rank) queries. It requires O(zlog(n/z)) words of space, where z is the number of Lempel-Ziv factors of the text. For some highly repetitive inputs, a block tree can require as little as 0.015 bits per character of the text. Small values of z make the block tree a space-efficient alternative to the wavelet tree, which is another index for these three types of queries. While wavelet trees can be constructed fast in practice, up so far compressed versions of the wavelet tree only leverage statistical compression, meaning that they are blind to spaced repetitions.
To make block trees usable in practice, a first step is to find ways in constructing them efficiently. We address this problem by presenting a practically efficient construction algorithm for block trees, which is up to an order of magnitude faster than previous implementations. Additionally, we parallelize our implementation, making it the first block tree construction implementation that works in parallel in shared memory
Construction of Hilbert Transform Pairs of Wavelet Bases and Gabor-like Transforms
We propose a novel method for constructing Hilbert transform (HT) pairs of
wavelet bases based on a fundamental approximation-theoretic characterization
of scaling functions--the B-spline factorization theorem. In particular,
starting from well-localized scaling functions, we construct HT pairs of
biorthogonal wavelet bases of L^2(R) by relating the corresponding wavelet
filters via a discrete form of the continuous HT filter. As a concrete
application of this methodology, we identify HT pairs of spline wavelets of a
specific flavor, which are then combined to realize a family of complex
wavelets that resemble the optimally-localized Gabor function for sufficiently
large orders.
Analytic wavelets, derived from the complexification of HT wavelet pairs,
exhibit a one-sided spectrum. Based on the tensor-product of such analytic
wavelets, and, in effect, by appropriately combining four separable
biorthogonal wavelet bases of L^2(R^2), we then discuss a methodology for
constructing 2D directional-selective complex wavelets. In particular,
analogous to the HT correspondence between the components of the 1D
counterpart, we relate the real and imaginary components of these complex
wavelets using a multi-dimensional extension of the HT--the directional HT.
Next, we construct a family of complex spline wavelets that resemble the
directional Gabor functions proposed by Daugman. Finally, we present an
efficient FFT-based filterbank algorithm for implementing the associated
complex wavelet transform.Comment: 36 pages, 8 figure
Prospects and limitations of full-text index structures in genome analysis
The combination of incessant advances in sequencing technology producing large amounts of data and innovative bioinformatics approaches, designed to cope with this data flood, has led to new interesting results in the life sciences. Given the magnitude of sequence data to be processed, many bioinformatics tools rely on efficient solutions to a variety of complex string problems. These solutions include fast heuristic algorithms and advanced data structures, generally referred to as index structures. Although the importance of index structures is generally known to the bioinformatics community, the design and potency of these data structures, as well as their properties and limitations, are less understood. Moreover, the last decade has seen a boom in the number of variant index structures featuring complex and diverse memory-time trade-offs. This article brings a comprehensive state-of-the-art overview of the most popular index structures and their recently developed variants. Their features, interrelationships, the trade-offs they impose, but also their practical limitations, are explained and compared
- …