Search CORE

11,083 research outputs found

Cache-Oblivious Peeling of Random Hypergraphs

Author: Belazzougui Djamal
Boldi Paolo
Ottaviano Giuseppe
Venturini Rossano
Vigna Sebastiano
Publication venue
Publication date: 02/12/2013
Field of study

The computation of a peeling order in a randomly generated hypergraph is the most time-consuming step in a number of constructions, such as perfect hashing schemes, random

r

-SAT solvers, error-correcting codes, and approximate set encodings. While there exists a straightforward linear time algorithm, its poor I/O performance makes it impractical for hypergraphs whose size exceeds the available internal memory. We show how to reduce the computation of a peeling order to a small number of sequential scans and sorts, and analyze its I/O complexity in the cache-oblivious model. The resulting algorithm requires

O(\mathrm{sort}(n))

I/Os and

O(n \log n)

time to peel a random hypergraph with

n

edges. We experimentally evaluate the performance of our implementation of this algorithm in a real-world scenario by using the construction of minimal perfect hash functions (MPHF) as our test case: our algorithm builds a MPHF of

7.6

billion keys in less than

21

hours on a single machine. The resulting data structure is both more space-efficient and faster than that obtained with the current state-of-the-art MPHF construction for large-scale key sets

arXiv.org e-Print Archive

Crossref

AIR Universita degli studi di Milano

Archivio della Ricerca - Università di Pisa

khmer: Working with Big Data in Bioinformatics

Author: Brown C. Titus
McDonald Eric
Publication venue
Publication date: 09/03/2013
Field of study

We introduce design and optimization considerations for the 'khmer' package.Comment: Invited chapter for forthcoming book on Performance of Open Source Application

arXiv.org e-Print Archive

CiteSeerX

Using Hashing to Solve the Dictionary Problem (In External Memory)

Author: Iacono John
Pǎtraşcu Mihai
Publication venue
Publication date: 01/01/2011
Field of study

We consider the dictionary problem in external memory and improve the update time of the well-known buffer tree by roughly a logarithmic factor. For any \lambda >= max {lg lg n, log_{M/B} (n/B)}, we can support updates in time O(\lambda / B) and queries in sublogarithmic time, O(log_\lambda n). We also present a lower bound in the cell-probe model showing that our data structure is optimal. In the RAM, hash tables have been used to solve the dictionary problem faster than binary search for more than half a century. By contrast, our data structure is the first to beat the comparison barrier in external memory. Ours is also the first data structure to depart convincingly from the indivisibility paradigm

arXiv.org e-Print Archive

CiteSeerX

DI-fusion

GPU LSM: A Dynamic Dictionary Data Structure for the GPU

Author: Amenta Nina
Ashkiani Saman
Farach-Colton Martin
Li Shengren
Owens John D.
Publication venue
Publication date: 01/01/2018
Field of study

We develop a dynamic dictionary data structure for the GPU, supporting fast insertions and deletions, based on the Log Structured Merge tree (LSM). Our implementation on an NVIDIA K40c GPU has an average update (insertion or deletion) rate of 225 M elements/s, 13.5x faster than merging items into a sorted array. The GPU LSM supports the retrieval operations of lookup, count, and range query operations with an average rate of 75 M, 32 M and 23 M queries/s respectively. The trade-off for the dynamic updates is that the sorted array is almost twice as fast on retrievals. We believe that our GPU LSM is the first dynamic general-purpose dictionary data structure for the GPU.Comment: 11 pages, accepted to appear on the Proceedings of IEEE International Parallel and Distributed Processing Symposium (IPDPS'18

arXiv.org e-Print Archive

eScholarship - University of California

Fast Scalable Construction of (Minimal Perfect Hash) Functions

Author: A Goerdt
AM Frieze
AM Odlyzko
BA LaMacchia
BS Majewski
D Belazzougui
D Belazzougui
D Belazzougui
D Belazzougui
FC Botelho
M Aumüller
M Dietzfelbinger
M Dietzfelbinger
N Fountoulakis
Publication venue
Publication date: 22/03/2016
Field of study

Recent advances in random linear systems on finite fields have paved the way for the construction of constant-time data structures representing static functions and minimal perfect hash functions using less space with respect to existing techniques. The main obstruction for any practical application of these results is the cubic-time Gaussian elimination required to solve these linear systems: despite they can be made very small, the computation is still too slow to be feasible. In this paper we describe in detail a number of heuristics and programming techniques to speed up the resolution of these systems by several orders of magnitude, making the overall construction competitive with the standard and widely used MWHC technique, which is based on hypergraph peeling. In particular, we introduce broadword programming techniques for fast equation manipulation and a lazy Gaussian elimination algorithm. We also describe a number of technical improvements to the data structure which further reduce space usage and improve lookup speed. Our implementation of these techniques yields a minimal perfect hash function data structure occupying 2.24 bits per element, compared to 2.68 for MWHC-based ones, and a static function data structure which reduces the multiplicative overhead from 1.23 to 1.03

arXiv.org e-Print Archive

Crossref