14,939 research outputs found
Recommended from our members
Finding succinct ordered minimal perfect hashing functions
An ordered minimal perfect hash table is one in which no collisions occur among a predefined set of keys, no space is unused, and the data are placed in the table in order. A new method for creating ordered minimal perfect hashing functions is presented. The method presented is based on a method developed by Fox, Heath, Daoud, and Chen, but it creates hash functions with representation space requirements closer to the theoretical lower bound. The method presented requires approximately 10% less space to represent generated hash functions, and is easier to implement than Fox et al's. However, a higher time complexity makes it practical for small sets only (< 1000)
Fast Scalable Construction of (Minimal Perfect Hash) Functions
Recent advances in random linear systems on finite fields have paved the way
for the construction of constant-time data structures representing static
functions and minimal perfect hash functions using less space with respect to
existing techniques. The main obstruction for any practical application of
these results is the cubic-time Gaussian elimination required to solve these
linear systems: despite they can be made very small, the computation is still
too slow to be feasible.
In this paper we describe in detail a number of heuristics and programming
techniques to speed up the resolution of these systems by several orders of
magnitude, making the overall construction competitive with the standard and
widely used MWHC technique, which is based on hypergraph peeling. In
particular, we introduce broadword programming techniques for fast equation
manipulation and a lazy Gaussian elimination algorithm. We also describe a
number of technical improvements to the data structure which further reduce
space usage and improve lookup speed.
Our implementation of these techniques yields a minimal perfect hash function
data structure occupying 2.24 bits per element, compared to 2.68 for MWHC-based
ones, and a static function data structure which reduces the multiplicative
overhead from 1.23 to 1.03
Constructing Minimal Perfect Hash Functions Using SAT Technology
Minimal perfect hash functions (MPHFs) are used to provide efficient access
to values of large dictionaries (sets of key-value pairs). Discovering new
algorithms for building MPHFs is an area of active research, especially from
the perspective of storage efficiency. The information-theoretic limit for
MPHFs is 1/(ln 2) or roughly 1.44 bits per key. The current best practical
algorithms range between 2 and 4 bits per key. In this article, we propose two
SAT-based constructions of MPHFs. Our first construction yields MPHFs near the
information-theoretic limit. For this construction, current state-of-the-art
SAT solvers can handle instances where the dictionaries contain up to 40
elements, thereby outperforming the existing (brute-force) methods. Our second
construction uses XOR-SAT filters to realize a practical approach with
long-term storage of approximately 1.83 bits per key.Comment: Accepted for AAAI 202
LILY-A Generator for Compiler Frontends
In this paper, LILY, a generator for compiler frontends is described. LILY uses a generator of minimal perfect hash functions, MPHF , to create small fast compilers
Recommended from our members
GPERF : a perfect hash function generator
gperf is a widely available perfect hash function generator written in C++. It automates a common system software operation: keyword recognition. gperf translates an n element user-specified keyword list keyfile into source code containing a k element lookup table and a pair of functions, phash and in_word_set. phash uniquely maps keywords in keyfile onto the range 0 .. k - 1, where k >/= n. If k = n, then phash is considered a minimal perfect hash function. in_word_set uses phash to determine whether a particular string of characters str occurs in the keyfile, using at most one string comparison.This paper describes the user-interface, options, features, algorithm design and implementation strategies incorporated in gperf. It also presents the results from an empirical comparison between gperf-generated recognizers and other popular techniques for reserved word lookup
An Improved Algorithm for Generating Minimal Perfect Hash Functions
A minimal perfect hash function (MPHF) is a function from a set of M objects to the first M non-negative integers. MPHF\u27s are useful for the compact storage and fast retrieval of frequently used objects such as reserved words in a programming language or commonly employed words in a natural language. In this paper we improve on an earlier result and present an algorithm for generating MPHF\u27s with an expected time complexity proportional to M4. We also give a MPHF for the 256 most frequently used words in the English language
High Performance Construction of RecSplit Based Minimal Perfect Hash Functions
A minimal perfect hash function (MPHF) bijectively maps a set S of objects to the first |S| integers. It can be used as a building block in databases and data compression. RecSplit [Esposito et al., ALENEX\u2720] is currently the most space efficient practical minimal perfect hash function. It heavily relies on trying out hash functions in a brute force way.
We introduce rotation fitting, a new technique that makes the search more efficient by drastically reducing the number of tried hash functions. Additionally, we greatly improve the construction time of RecSplit by harnessing parallelism on the level of bits, vectors, cores, and GPUs.
In combination, the resulting improvements yield speedups up to 239 on an 8-core CPU and up to 5438 using a GPU. The original single-threaded RecSplit implementation needs 1.5 hours to construct an MPHF for 5 Million objects with 1.56 bits per object. On the GPU, we achieve the same space usage in just 5 seconds. Given that the speedups are larger than the increase in energy consumption, our implementation is more energy efficient than the original implementation
High Performance Construction of RecSplit Based Minimal Perfect Hash Functions
A minimal perfect hash function (MPHF) bijectively maps a set S of objects to the first |S| integers. It can be used as a building block in databases and data compression. RecSplit [Esposito et al., ALENEX\u2720] is currently the most space efficient practical minimal perfect hash function. It heavily relies on trying out hash functions in a brute force way.
We introduce rotation fitting, a new technique that makes the search more efficient by drastically reducing the number of tried hash functions. Additionally, we greatly improve the construction time of RecSplit by harnessing parallelism on the level of bits, vectors, cores, and GPUs.
In combination, the resulting improvements yield speedups up to 239 on an 8-core CPU and up to 5438 using a GPU. The original single-threaded RecSplit implementation needs 1.5 hours to construct an MPHF for 5 Million objects with 1.56 bits per object. On the GPU, we achieve the same space usage in just 5 seconds. Given that the speedups are larger than the increase in energy consumption, our implementation is more energy efficient than the original implementation
- …