Search CORE

4,412 research outputs found

Recommended from our members

GPERF : a perfect hash function generator

Author: Schmidt Douglas C.
Suda Tatsuya
Publication venue: eScholarship, University of California
Publication date: 01/01/1992
Field of study

gperf is a widely available perfect hash function generator written in C++. It automates a common system software operation: keyword recognition. gperf translates an n element user-specified keyword list keyfile into source code containing a k element lookup table and a pair of functions, phash and in_word_set. phash uniquely maps keywords in keyfile onto the range 0 .. k - 1, where k >/= n. If k = n, then phash is considered a minimal perfect hash function. in_word_set uses phash to determine whether a particular string of characters str occurs in the keyfile, using at most one string comparison.This paper describes the user-interface, options, features, algorithm design and implementation strategies incorporated in gperf. It also presents the results from an empirical comparison between gperf-generated recognizers and other popular techniques for reserved word lookup

eScholarship - University of California

ShockHash: Towards Optimal-Space Minimal Perfect Hashing Beyond Brute-Force

Author: Lehmann Hans-Peter
Sanders Peter
Walzer Stefan
Publication venue
Publication date: 13/11/2023
Field of study

A minimal perfect hash function (MPHF) maps a set

S

n

keys to the first

n

integers without collisions. There is a lower bound of

n\log_2e-O(\log n)

bits of space needed to represent an MPHF. A matching upper bound is obtained using the brute-force algorithm that tries random hash functions until stumbling on an MPHF and stores that function's seed. In expectation,

e^n\textrm{poly}(n)

seeds need to be tested. The most space-efficient previous algorithms for constructing MPHFs all use such a brute-force approach as a basic building block. In this paper, we introduce ShockHash - Small, heavily overloaded cuckoo hash tables. ShockHash uses two hash functions

h_0

and

h_1

, hoping for the existence of a function

f : S \rightarrow \{0,1\}

such that

x \mapsto h_{f(x)}(x)

is an MPHF on

S

. In graph terminology, ShockHash generates

n

-edge random graphs until stumbling on a pseudoforest - a graph where each component contains as many edges as nodes. Using cuckoo hashing, ShockHash then derives an MPHF from the pseudoforest in linear time. It uses a 1-bit retrieval data structure to store

f

using

n + o(n)

bits. By carefully analyzing the probability that a random graph is a pseudoforest, we show that ShockHash needs to try only

(e/2)^n\textrm{poly}(n)

hash function seeds in expectation, reducing the space for storing the seed by roughly

n

bits. This makes ShockHash almost a factor

2^n

faster than brute-force, while maintaining the asymptotically optimal space consumption. An implementation within the RecSplit framework yields the currently most space efficient MPHFs, i.e., competing approaches need about two orders of magnitude more work to achieve the same space

arXiv.org e-Print Archive

Cache-Oblivious Peeling of Random Hypergraphs

Author: Belazzougui Djamal
Boldi Paolo
Ottaviano Giuseppe
Venturini Rossano
Vigna Sebastiano
Publication venue
Publication date: 02/12/2013
Field of study

The computation of a peeling order in a randomly generated hypergraph is the most time-consuming step in a number of constructions, such as perfect hashing schemes, random

r

-SAT solvers, error-correcting codes, and approximate set encodings. While there exists a straightforward linear time algorithm, its poor I/O performance makes it impractical for hypergraphs whose size exceeds the available internal memory. We show how to reduce the computation of a peeling order to a small number of sequential scans and sorts, and analyze its I/O complexity in the cache-oblivious model. The resulting algorithm requires

O(\mathrm{sort}(n))

I/Os and

O(n \log n)

time to peel a random hypergraph with

n

edges. We experimentally evaluate the performance of our implementation of this algorithm in a real-world scenario by using the construction of minimal perfect hash functions (MPHF) as our test case: our algorithm builds a MPHF of

7.6

billion keys in less than

21

hours on a single machine. The resulting data structure is both more space-efficient and faster than that obtained with the current state-of-the-art MPHF construction for large-scale key sets

arXiv.org e-Print Archive

Crossref

AIR Universita degli studi di Milano

Archivio della Ricerca - Università di Pisa

Fast Scalable Construction of (Minimal Perfect Hash) Functions

Author: A Goerdt
AM Frieze
AM Odlyzko
BA LaMacchia
BS Majewski
D Belazzougui
D Belazzougui
D Belazzougui
D Belazzougui
FC Botelho
M Aumüller
M Dietzfelbinger
M Dietzfelbinger
N Fountoulakis
Publication venue
Publication date: 22/03/2016
Field of study

Recent advances in random linear systems on finite fields have paved the way for the construction of constant-time data structures representing static functions and minimal perfect hash functions using less space with respect to existing techniques. The main obstruction for any practical application of these results is the cubic-time Gaussian elimination required to solve these linear systems: despite they can be made very small, the computation is still too slow to be feasible. In this paper we describe in detail a number of heuristics and programming techniques to speed up the resolution of these systems by several orders of magnitude, making the overall construction competitive with the standard and widely used MWHC technique, which is based on hypergraph peeling. In particular, we introduce broadword programming techniques for fast equation manipulation and a lazy Gaussian elimination algorithm. We also describe a number of technical improvements to the data structure which further reduce space usage and improve lookup speed. Our implementation of these techniques yields a minimal perfect hash function data structure occupying 2.24 bits per element, compared to 2.68 for MWHC-based ones, and a static function data structure which reduces the multiplicative overhead from 1.23 to 1.03

arXiv.org e-Print Archive

Crossref

SicHash -- Small Irregular Cuckoo Tables for Perfect Hashing

Author: Lehmann Hans-Peter
Sanders Peter
Walzer Stefan
Publication venue
Publication date: 08/11/2022
Field of study

A Perfect Hash Function (PHF) is a hash function that has no collisions on a given input set. PHFs can be used for space efficient storage of data in an array, or for determining a compact representative of each object in the set. In this paper, we present the PHF construction algorithm SicHash - Small Irregular Cuckoo Tables for Perfect Hashing. At its core, SicHash uses a known technique: It places objects in a cuckoo hash table and then stores the final hash function choice of each object in a retrieval data structure. We combine the idea with irregular cuckoo hashing, where each object has a different number of hash functions. Additionally, we use many small tables that we overload beyond their asymptotic maximum load factor. The most space efficient competitors often use brute force methods to determine the PHFs. SicHash provides a more direct construction algorithm that only rarely needs to recompute parts. Our implementation improves the state of the art in terms of space usage versus construction time for a wide range of configurations. At the same time, it provides very fast queries

arXiv.org e-Print Archive

KITopen

Fast Algorithms for Parameterized Problems with Relaxed Disjointness Constraints

Author: Gabizon Ariel
Lokshtanov Daniel
Pilipczuk Michal
Publication venue
Publication date: 24/04/2015
Field of study

In parameterized complexity, it is a natural idea to consider different generalizations of classic problems. Usually, such generalization are obtained by introducing a "relaxation" variable, where the original problem corresponds to setting this variable to a constant value. For instance, the problem of packing sets of size at most

p

into a given universe generalizes the Maximum Matching problem, which is recovered by taking

p=2

. Most often, the complexity of the problem increases with the relaxation variable, but very recently Abasi et al. have given a surprising example of a problem ---

r

-Simple

k

-Path --- that can be solved by a randomized algorithm with running time

O^*(2^{O(k \frac{\log r}{r})})

. That is, the complexity of the problem decreases with

r

. In this paper we pursue further the direction sketched by Abasi et al. Our main contribution is a derandomization tool that provides a deterministic counterpart of the main technical result of Abasi et al.: the

O^*(2^{O(k \frac{\log r}{r})})

algorithm for

(r,k)

-Monomial Detection, which is the problem of finding a monomial of total degree

k

and individual degrees at most

r

in a polynomial given as an arithmetic circuit. Our technique works for a large class of circuits, and in particular it can be used to derandomize the result of Abasi et al. for

r

-Simple

k

-Path. On our way to this result we introduce the notion of representative sets for multisets, which may be of independent interest. Finally, we give two more examples of problems that were already studied in the literature, where the same relaxation phenomenon happens. The first one is a natural relaxation of the Set Packing problem, where we allow the packed sets to overlap at each element at most

r

times. The second one is Degree Bounded Spanning Tree, where we seek for a spanning tree of the graph with a small maximum degree

arXiv.org e-Print Archive

CiteSeerX

Fast and Scalable Minimal Perfect Hashing for Massive Key Sets

Author: Chikhi Rayan
Limasset Antoine
Peterlongo Pierre
Rizk Guillaume
Publication venue: LIPIcs - Leibniz International Proceedings in Informatics. 16th International Symposium on Experimental Algorithms (SEA 2017)
Publication date: 01/01/2017
Field of study

Minimal perfect hash functions provide space-efficient and collision-free hashing on static sets. Existing algorithms and implementations that build such functions have practical limitations on the number of input elements they can process, due to high construction time, RAM or external memory usage. We revisit a simple algorithm and show that it is highly competitive with the state of the art, especially in terms of construction time and memory usage. We provide a parallel C++ implementation called BBhash. It is capable of creating a minimal perfect hash function of 10^{10} elements in less than 7 minutes using 8 threads and 5 GB of memory, and the resulting function uses 3.7 bits/element. To the best of our knowledge, this is also the first implementation that has been successfully tested on an input of cardinality 10^{12}. Source code: https://github.com/rizkg/BBHas

arXiv.org e-Print Archive

INRIA a CCSD electronic archive server

HAL Descartes

Dagstuhl Research Online Publication Server

Hal-Diderot

HAL-Rennes 1

Parallel Searching for a First Solution

Author: Bartoszek Bozena
Czeck Zbigniew
Konopka Marek
Publication venue: University of Kent, Canterbury, UK
Publication date: 01/09/1993
Field of study

A parallel algorithm for conducting a search for a first solution to the problem of generating minimal perfect hash functions is presented. A message-based distributed memory computer is assumed as a model for parallel computations. A data structure, called reverse trie (r-trie), was devised to carry out the search. The algorithm was implemented on a transputer network. The experiments showed that the algorithm exhibits a consistent and almost linear speed-up. The r-trie structure proved to be highly memory efficient

Kent Academic Repository