11,134 research outputs found

    Parallel Wavelet Tree Construction

    Full text link
    We present parallel algorithms for wavelet tree construction with polylogarithmic depth, improving upon the linear depth of the recent parallel algorithms by Fuentes-Sepulveda et al. We experimentally show on a 40-core machine with two-way hyper-threading that we outperform the existing parallel algorithms by 1.3--5.6x and achieve up to 27x speedup over the sequential algorithm on a variety of real-world and artificial inputs. Our algorithms show good scalability with increasing thread count, input size and alphabet size. We also discuss extensions to variants of the standard wavelet tree.Comment: This is a longer version of the paper that appears in the Proceedings of the IEEE Data Compression Conference, 201

    Parallel Construction of Wavelet Trees on Multicore Architectures

    Get PDF
    The wavelet tree has become a very useful data structure to efficiently represent and query large volumes of data in many different domains, from bioinformatics to geographic information systems. One problem with wavelet trees is their construction time. In this paper, we introduce two algorithms that reduce the time complexity of a wavelet tree's construction by taking advantage of nowadays ubiquitous multicore machines. Our first algorithm constructs all the levels of the wavelet in parallel in O(n)O(n) time and O(nlgσ+σlgn)O(n\lg\sigma + \sigma\lg n) bits of working space, where nn is the size of the input sequence and σ\sigma is the size of the alphabet. Our second algorithm constructs the wavelet tree in a domain-decomposition fashion, using our first algorithm in each segment, reaching O(lgn)O(\lg n) time and O(nlgσ+pσlgn/lgσ)O(n\lg\sigma + p\sigma\lg n/\lg\sigma) bits of extra space, where pp is the number of available cores. Both algorithms are practical and report good speedup for large real datasets.Comment: This research has received funding from the European Union's Horizon 2020 research and innovation programme under the Marie Sk{\l}odowska-Curie Actions H2020-MSCA-RISE-2015 BIRDS GA No. 69094

    Parallel External Memory Wavelet Tree and Wavelet Matrix Construction

    Get PDF

    Faster Block Tree Construction

    Get PDF
    The block tree [Belazzougui et al. J. Comput. Syst. Sci. \u2721] is a compressed text index that can answer access (extract a character at a position), rank (number of occurrences of a specified character in a prefix of the text), and select (size of smallest prefix such that a specified character has a specified rank) queries. It requires O(zlog(n/z)) words of space, where z is the number of Lempel-Ziv factors of the text. For some highly repetitive inputs, a block tree can require as little as 0.015 bits per character of the text. Small values of z make the block tree a space-efficient alternative to the wavelet tree, which is another index for these three types of queries. While wavelet trees can be constructed fast in practice, up so far compressed versions of the wavelet tree only leverage statistical compression, meaning that they are blind to spaced repetitions. To make block trees usable in practice, a first step is to find ways in constructing them efficiently. We address this problem by presenting a practically efficient construction algorithm for block trees, which is up to an order of magnitude faster than previous implementations. Additionally, we parallelize our implementation, making it the first block tree construction implementation that works in parallel in shared memory

    Construction of Hilbert Transform Pairs of Wavelet Bases and Gabor-like Transforms

    Get PDF
    We propose a novel method for constructing Hilbert transform (HT) pairs of wavelet bases based on a fundamental approximation-theoretic characterization of scaling functions--the B-spline factorization theorem. In particular, starting from well-localized scaling functions, we construct HT pairs of biorthogonal wavelet bases of L^2(R) by relating the corresponding wavelet filters via a discrete form of the continuous HT filter. As a concrete application of this methodology, we identify HT pairs of spline wavelets of a specific flavor, which are then combined to realize a family of complex wavelets that resemble the optimally-localized Gabor function for sufficiently large orders. Analytic wavelets, derived from the complexification of HT wavelet pairs, exhibit a one-sided spectrum. Based on the tensor-product of such analytic wavelets, and, in effect, by appropriately combining four separable biorthogonal wavelet bases of L^2(R^2), we then discuss a methodology for constructing 2D directional-selective complex wavelets. In particular, analogous to the HT correspondence between the components of the 1D counterpart, we relate the real and imaginary components of these complex wavelets using a multi-dimensional extension of the HT--the directional HT. Next, we construct a family of complex spline wavelets that resemble the directional Gabor functions proposed by Daugman. Finally, we present an efficient FFT-based filterbank algorithm for implementing the associated complex wavelet transform.Comment: 36 pages, 8 figure

    Prospects and limitations of full-text index structures in genome analysis

    Get PDF
    The combination of incessant advances in sequencing technology producing large amounts of data and innovative bioinformatics approaches, designed to cope with this data flood, has led to new interesting results in the life sciences. Given the magnitude of sequence data to be processed, many bioinformatics tools rely on efficient solutions to a variety of complex string problems. These solutions include fast heuristic algorithms and advanced data structures, generally referred to as index structures. Although the importance of index structures is generally known to the bioinformatics community, the design and potency of these data structures, as well as their properties and limitations, are less understood. Moreover, the last decade has seen a boom in the number of variant index structures featuring complex and diverse memory-time trade-offs. This article brings a comprehensive state-of-the-art overview of the most popular index structures and their recently developed variants. Their features, interrelationships, the trade-offs they impose, but also their practical limitations, are explained and compared
    corecore