13 research outputs found

    An O(n log n) Algorithm for Finding Minimal Perfect Hash Functions

    Get PDF
    We describe the first practical algorithm for finding minimal perfect hash functions that can be used to access very large databases. This method extends earlier work wherein an O(n3) algorithm was devised, building upon prior work by Sager that described an O(n4) algorithm. Our new O(n log n) expected time algorithm makes use of three key insights: applying randomness whereever possible, ordering our search for hash functions based on the degree of the vertices in a graph that represents word dependencies, and viewing hash value assignment in terms of adding circular patterns of related words to a partially filled disk. While ultimately applicable to a wide variety of data and file access needs, this algorithm has already proven useful in aiding our work in improving performance of CD-ROM systems and our construction of a Large External Network Database (LEND) for semantic networks and hypertext/hypermedia collections. Virginia Disc One includes a demonstration of a minimal perfect hash function running on a PC to access a 130,198 word list on that CD-ROM, and several other microcomputer, minicomputer, and parallel processor versions and applications of our algorithm have also been developed. Tests with a French word list of 420,878 entries and a library catalog key set with over 1.2 million keys have shown that our methods work with very large databases

    LEND and Faster Algorithms for Constructing Minimal Perfect Hash Functions

    Get PDF
    The Large External object-oriented Network Database (LEND) system has been developed to provide efficient access to large collections of primitive or multimedia objects, semantic networks, thesauri, hypertexts, and information retrieval collections. An overview of LEND is given, emphasizing aspects that yield efficient operation. In particular, a new algorithm is described for quickly finding minimal perfect hash functions whose specification space is very close to the theoretical lower bound, i.e., around 2 bits per key. The various stages of processing are detailed, along with analytical and empirical results, including timing for a set of over 3.8 million keys that was processed on a NeXTstation in about 6 hours

    Parallel Searching for a First Solution

    Get PDF
    A parallel algorithm for conducting a search for a first solution to the problem of generating minimal perfect hash functions is presented. A message-based distributed memory computer is assumed as a model for parallel computations. A data structure, called reverse trie (r-trie), was devised to carry out the search. The algorithm was implemented on a transputer network. The experiments showed that the algorithm exhibits a consistent and almost linear speed-up. The r-trie structure proved to be highly memory efficient

    Practical Minimal Perfect Hashing Functions for Large Databases

    Get PDF
    We describe the first practical algorithms for finding minimal perfect hash functions that have been used to access very large databases (i.e., having over 1 million keys). This method extends earlier work wherein an 0(n-cubed) algorithm was devised, building upon prior work by Sager that described an 0(n-to the fourth) algorithm. Our first linear expected time algorithm makes use of three key insights: applying randomness whereever possible, ordering our search for hash functions based on the degree of the vertices in a graph that represents word dependencies, and viewing hash value assignment in terms of adding circular patterns of related words to a partially filled disk. Our second algorithm builds functions that are slightly more complex, but does not build a word dependency graph and so approaches the theoretical lower bound on function specification size. While ultimately applicable to a wide variety of data and file access needs, these algorithms have already proven useful in aiding our work in improving the performance of CD-ROM systems and our construction of a Large External Network Database (LEND) for semantic networks and hypertext/hypermedia collections. Virginia Disc One includes a demonstration of a minimal perfect hash function running on a PC to access a 130,198 word list on that CD-ROM. Several other microcomputer, minicomputer, and parallel processor versions and applications of our algorithm have also been developed. Tests including those wiht a French word list of 420,878 entries and a library catalog key set with over 3.8 million keys have shown that our methods work with very large databases

    Fast Flow Analysis with Godel Hashes

    Full text link
    Abstractā€”Flow analysis, such as control-flow, data-flow, and exception-flow analysis, usually depends on relational operations on flow sets. Unfortunately, set related operations, such as inclusion and equality, are usually very expensive. They can easily take more than 97 % of the total analyzing time, even in a very simple analysis. We attack this performance bottleneck by proposing GoĢˆdel hashes to enable fast and precise flow analysis. GoĢˆdel hashes is an ultra compact, partial-order-preserving, fast and perfect hashing mechanism, inspired by the proofs of GoĢˆdelā€™s incompleteness theorems. Compared with array-, tree-, traditional hash-, and bit vector-backed set implementations, we find GoĢˆdel hashes to be tens or even hundreds of times faster for performance in the critical operations of inclusion and equality. We apply GoĢˆdel hashes in real-world analysis for object-oriented programs. The instrumented analysis is tens of times faster than the one with original data structures on DaCapo benchmarks. I

    Comparison of Perfect Hashing Methods

    Get PDF
    This study was conducted to compare two minimal perfect hashing method Chang's method and Jaeschke's method. Since hashing is a widely used technique for store data in symbol table and the data are strings of characters, this study f use on the performance of these methods with the letter-oriented set and gives their run time performance curves. Through the analysis of run time and space complexity, an optimal method is given to make each algorithm performance well

    Finding minimal perfect hash functions

    No full text

    Finding Minimal Perfect Hash Functions

    Full text link
    A heuristic is given for finding minimal perfect hash functions without extensive searching. The procedure is to construct a set of graph (or hypergraph) models for the dictionary, then choose one of the models for use in constructing the minimal perfect hashing function. The construction of this function relies on a backtracking algorithm for numbering the vertices of the graph. Careful selection of the graph model limits the time spent searching. Good results have been obtained for dictionaries of up to 181 words. Using the same techniques, non-minimal perfect hash functions have been found for sets of up to 667 words

    Finding minimal perfect hash functions

    No full text
    corecore