13 research outputs found
An O(n log n) Algorithm for Finding Minimal Perfect Hash Functions
We describe the first practical algorithm for finding minimal perfect hash functions that can be used to access very large databases. This method extends earlier work wherein an O(n3) algorithm was devised, building upon prior work by Sager that described an O(n4) algorithm. Our new O(n log n) expected time algorithm makes use of three key insights: applying randomness whereever possible, ordering our search for hash functions based on the degree of the vertices in a graph that represents word dependencies, and viewing hash value assignment in terms of adding circular patterns of related words to a partially filled disk. While ultimately applicable to a wide variety of data and file access needs, this algorithm has already proven useful in aiding our work in improving performance of CD-ROM systems and our construction of a Large External Network Database (LEND) for semantic networks and hypertext/hypermedia collections. Virginia Disc One includes a demonstration of a minimal perfect hash function running on a PC to access a 130,198 word list on that CD-ROM, and several other microcomputer, minicomputer, and parallel processor versions and applications of our algorithm have also been developed. Tests with a French word list of 420,878 entries and a library catalog key set with over 1.2 million keys have shown that our methods work with very large databases
LEND and Faster Algorithms for Constructing Minimal Perfect Hash Functions
The Large External object-oriented Network Database (LEND) system has been developed to provide efficient access to large collections of primitive or multimedia objects, semantic networks, thesauri, hypertexts, and information retrieval collections. An overview of LEND is given, emphasizing aspects that yield efficient operation. In particular, a new algorithm is described for quickly finding minimal perfect hash functions whose specification space is very close to the theoretical lower bound, i.e., around 2 bits per key. The various stages of processing are detailed, along with analytical and empirical results, including timing for a set of over 3.8 million keys that was processed on a NeXTstation in about 6 hours
Parallel Searching for a First Solution
A parallel algorithm for conducting a search for a first solution to the problem of generating minimal perfect hash functions is presented. A message-based distributed memory computer is assumed as a model for parallel computations. A data structure, called reverse trie (r-trie), was devised to carry out the search. The algorithm was implemented on a transputer network. The experiments showed that the algorithm exhibits a consistent and almost linear speed-up. The r-trie structure proved to be highly memory efficient
Practical Minimal Perfect Hashing Functions for Large Databases
We describe the first practical algorithms for finding minimal perfect hash functions that have been used to access very large databases (i.e., having over 1 million keys). This method extends earlier work wherein an 0(n-cubed) algorithm was devised, building upon prior work by Sager that described an 0(n-to the fourth) algorithm. Our first linear expected time algorithm makes use of three key insights: applying randomness whereever possible, ordering our search for hash functions based on the degree of the vertices in a graph that represents word dependencies, and viewing hash value assignment in terms of adding circular patterns of related words to a partially filled disk. Our second algorithm builds functions that are slightly more complex, but does not build a word dependency graph and so approaches the theoretical lower bound on function specification size. While ultimately applicable to a wide variety of data and file access needs, these algorithms have already proven useful in aiding our work in improving the performance of CD-ROM systems and our construction of a Large External Network Database (LEND) for semantic networks and hypertext/hypermedia collections. Virginia Disc One includes a demonstration of a minimal perfect hash function running on a PC to access a 130,198 word list on that CD-ROM. Several other microcomputer, minicomputer, and parallel processor versions and applications of our algorithm have also been developed. Tests including those wiht a French word list of 420,878 entries and a library catalog key set with over 3.8 million keys have shown that our methods work with very large databases
Fast Flow Analysis with Godel Hashes
AbstractāFlow analysis, such as control-flow, data-flow, and exception-flow analysis, usually depends on relational operations on flow sets. Unfortunately, set related operations, such as inclusion and equality, are usually very expensive. They can easily take more than 97 % of the total analyzing time, even in a very simple analysis. We attack this performance bottleneck by proposing GoĢdel hashes to enable fast and precise flow analysis. GoĢdel hashes is an ultra compact, partial-order-preserving, fast and perfect hashing mechanism, inspired by the proofs of GoĢdelās incompleteness theorems. Compared with array-, tree-, traditional hash-, and bit vector-backed set implementations, we find GoĢdel hashes to be tens or even hundreds of times faster for performance in the critical operations of inclusion and equality. We apply GoĢdel hashes in real-world analysis for object-oriented programs. The instrumented analysis is tens of times faster than the one with original data structures on DaCapo benchmarks. I
Comparison of Perfect Hashing Methods
This study was conducted to compare two minimal perfect hashing method Chang's method and Jaeschke's method. Since hashing is a widely used technique for store data in symbol table and the data are strings of characters, this study f use on the performance of these methods with the letter-oriented set and gives their run time performance curves. Through the analysis of run time and space complexity, an optimal method is given to make each algorithm performance well
Recommended from our members
Internal hashing for dynamic and static tables
This tutorial discusses one of the oldest problems in computing: how to search and retrieve keyed information from a list in the least amount of time. Hashing - a technique that mathematically converts a key into a storage address - is one of the best methods of finding and retrieving information associated with a unique identifying key. We briefly survey techniques which have evolved over the past 25 years and then introduce more recent research results for extremely compact and fast methods based on perĀfect and minimal perfect hashing. Perfect and minimal perfect hashing is useful for rapid lookup in a static table such as keywords in a compiler, spelling checkers, and database management systems. The results presented here show techniques for constructing long lists which can be searched in one memoĀry reference.KEYWORDS AND PHRASES: Key-to-address transformation, hash coding, hash table, scatter table, bucket hashing, perfect hashing, minimal perfect hashin
Finding Minimal Perfect Hash Functions
A heuristic is given for finding minimal perfect hash functions without extensive searching. The procedure is to construct a set of graph (or hypergraph) models for the dictionary, then choose one of the models for use in constructing the minimal perfect hashing function. The construction of this function relies on a backtracking algorithm for numbering the vertices of the graph. Careful selection of the graph model limits the time spent searching. Good results have been obtained for dictionaries of up to 181 words. Using the same techniques, non-minimal perfect hash functions have been found for sets of up to 667 words