2,502 research outputs found
Recommended from our members
Finding succinct ordered minimal perfect hashing functions
An ordered minimal perfect hash table is one in which no collisions occur among a predefined set of keys, no space is unused, and the data are placed in the table in order. A new method for creating ordered minimal perfect hashing functions is presented. The method presented is based on a method developed by Fox, Heath, Daoud, and Chen, but it creates hash functions with representation space requirements closer to the theoretical lower bound. The method presented requires approximately 10% less space to represent generated hash functions, and is easier to implement than Fox et al's. However, a higher time complexity makes it practical for small sets only (< 1000)
Recommended from our members
GPERF : a perfect hash function generator
gperf is a widely available perfect hash function generator written in C++. It automates a common system software operation: keyword recognition. gperf translates an n element user-specified keyword list keyfile into source code containing a k element lookup table and a pair of functions, phash and in_word_set. phash uniquely maps keywords in keyfile onto the range 0 .. k - 1, where k >/= n. If k = n, then phash is considered a minimal perfect hash function. in_word_set uses phash to determine whether a particular string of characters str occurs in the keyfile, using at most one string comparison.This paper describes the user-interface, options, features, algorithm design and implementation strategies incorporated in gperf. It also presents the results from an empirical comparison between gperf-generated recognizers and other popular techniques for reserved word lookup
Optimization of Tree Modes for Parallel Hash Functions: A Case Study
This paper focuses on parallel hash functions based on tree modes of
operation for an inner Variable-Input-Length function. This inner function can
be either a single-block-length (SBL) and prefix-free MD hash function, or a
sponge-based hash function. We discuss the various forms of optimality that can
be obtained when designing parallel hash functions based on trees where all
leaves have the same depth. The first result is a scheme which optimizes the
tree topology in order to decrease the running time. Then, without affecting
the optimal running time we show that we can slightly change the corresponding
tree topology so as to minimize the number of required processors as well.
Consequently, the resulting scheme decreases in the first place the running
time and in the second place the number of required processors.Comment: Preprint version. Added citations, IEEE Transactions on Computers,
201
Data structures
We discuss data structures and their methods of analysis. In particular, we treat the unweighted and weighted dictionary problem, self-organizing data structures, persistent data structures, the union-find-split problem, priority queues, the nearest common ancestor problem, the selection and merging problem, and dynamization techniques. The methods of analysis are worst, average and amortized case
Succinct Data Structures for Retrieval and Approximate Membership
The retrieval problem is the problem of associating data with keys in a set.
Formally, the data structure must store a function f: U ->{0,1}^r that has
specified values on the elements of a given set S, a subset of U, |S|=n, but
may have any value on elements outside S. Minimal perfect hashing makes it
possible to avoid storing the set S, but this induces a space overhead of
Theta(n) bits in addition to the nr bits needed for function values. In this
paper we show how to eliminate this overhead. Moreover, we show that for any k
query time O(k) can be achieved using space that is within a factor 1+e^{-k} of
optimal, asymptotically for large n. If we allow logarithmic evaluation time,
the additive overhead can be reduced to O(log log n) bits whp. The time to
construct the data structure is O(n), expected. A main technical ingredient is
to utilize existing tight bounds on the probability of almost square random
matrices with rows of low weight to have full row rank. In addition to direct
constructions, we point out a close connection between retrieval structures and
hash tables where keys are stored in an array and some kind of probing scheme
is used. Further, we propose a general reduction that transfers the results on
retrieval into analogous results on approximate membership, a problem
traditionally addressed using Bloom filters. Again, we show how to eliminate
the space overhead present in previously known methods, and get arbitrarily
close to the lower bound. The evaluation procedures of our data structures are
extremely simple (similar to a Bloom filter). For the results stated above we
assume free access to fully random hash functions. However, we show how to
justify this assumption using extra space o(n) to simulate full randomness on a
RAM
- …