Search CORE

13 research outputs found

An O(n log n) Algorithm for Finding Minimal Perfect Hash Functions

Author: Chen Qi-Fan
Fox Edward A.
Heath Lenwood S.
Publication venue
Publication date: 01/01/1989
Field of study

We describe the first practical algorithm for finding minimal perfect hash functions that can be used to access very large databases. This method extends earlier work wherein an O(n3) algorithm was devised, building upon prior work by Sager that described an O(n4) algorithm. Our new O(n log n) expected time algorithm makes use of three key insights: applying randomness whereever possible, ordering our search for hash functions based on the degree of the vertices in a graph that represents word dependencies, and viewing hash value assignment in terms of adding circular patterns of related words to a partially filled disk. While ultimately applicable to a wide variety of data and file access needs, this algorithm has already proven useful in aiding our work in improving performance of CD-ROM systems and our construction of a Large External Network Database (LEND) for semantic networks and hypertext/hypermedia collections. Virginia Disc One includes a demonstration of a minimal perfect hash function running on a PC to access a 130,198 word list on that CD-ROM, and several other microcomputer, minicomputer, and parallel processor versions and applications of our algorithm have also been developed. Tests with a French word list of 420,878 entries and a library catalog key set with over 1.2 million keys have shown that our methods work with very large databases

Computer Science Technical Reports @Virginia Tech

LEND and Faster Algorithms for Constructing Minimal Perfect Hash Functions

Author: Chen Qi-Fan
Fox Edward A.
Heath Lenwood S.
Publication venue
Publication date: 01/01/1992
Field of study

The Large External object-oriented Network Database (LEND) system has been developed to provide efficient access to large collections of primitive or multimedia objects, semantic networks, thesauri, hypertexts, and information retrieval collections. An overview of LEND is given, emphasizing aspects that yield efficient operation. In particular, a new algorithm is described for quickly finding minimal perfect hash functions whose specification space is very close to the theoretical lower bound, i.e., around 2 bits per key. The various stages of processing are detailed, along with analytical and empirical results, including timing for a set of over 3.8 million keys that was processed on a NeXTstation in about 6 hours

Computer Science Technical Reports @Virginia Tech

Parallel Searching for a First Solution

Author: Bartoszek Bozena
Czeck Zbigniew
Konopka Marek
Publication venue: University of Kent, Canterbury, UK
Publication date: 01/09/1993
Field of study

A parallel algorithm for conducting a search for a first solution to the problem of generating minimal perfect hash functions is presented. A message-based distributed memory computer is assumed as a model for parallel computations. A data structure, called reverse trie (r-trie), was devised to carry out the search. The algorithm was implemented on a transputer network. The experiments showed that the algorithm exhibits a consistent and almost linear speed-up. The r-trie structure proved to be highly memory efficient

Kent Academic Repository

Practical Minimal Perfect Hashing Functions for Large Databases

Author: Chen Qi-Fan
Daoud Amjad M.
Fox Edward A.
Heath Lenwood S.
Publication venue
Publication date: 01/01/1990
Field of study

We describe the first practical algorithms for finding minimal perfect hash functions that have been used to access very large databases (i.e., having over 1 million keys). This method extends earlier work wherein an 0(n-cubed) algorithm was devised, building upon prior work by Sager that described an 0(n-to the fourth) algorithm. Our first linear expected time algorithm makes use of three key insights: applying randomness whereever possible, ordering our search for hash functions based on the degree of the vertices in a graph that represents word dependencies, and viewing hash value assignment in terms of adding circular patterns of related words to a partially filled disk. Our second algorithm builds functions that are slightly more complex, but does not build a word dependency graph and so approaches the theoretical lower bound on function specification size. While ultimately applicable to a wide variety of data and file access needs, these algorithms have already proven useful in aiding our work in improving the performance of CD-ROM systems and our construction of a Large External Network Database (LEND) for semantic networks and hypertext/hypermedia collections. Virginia Disc One includes a demonstration of a minimal perfect hash function running on a PC to access a 130,198 word list on that CD-ROM. Several other microcomputer, minicomputer, and parallel processor versions and applications of our algorithm have also been developed. Tests including those wiht a French word list of 420,878 entries and a library catalog key set with over 3.8 million keys have shown that our methods work with very large databases

Computer Science Technical Reports @Virginia Tech

Fast Flow Analysis with Godel Hashes

Author: Matthew Might
Shuying Liang
Weibin Sun
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2014
Field of study

Abstract—Flow analysis, such as control-flow, data-flow, and exception-flow analysis, usually depends on relational operations on flow sets. Unfortunately, set related operations, such as inclusion and equality, are usually very expensive. They can easily take more than 97 % of the total analyzing time, even in a very simple analysis. We attack this performance bottleneck by proposing Gödel hashes to enable fast and precise flow analysis. Gödel hashes is an ultra compact, partial-order-preserving, fast and perfect hashing mechanism, inspired by the proofs of Gödel’s incompleteness theorems. Compared with array-, tree-, traditional hash-, and bit vector-backed set implementations, we find Gödel hashes to be tens or even hundreds of times faster for performance in the critical operations of inclusion and equality. We apply Gödel hashes in real-world analysis for object-oriented programs. The instrumented analysis is tens of times faster than the one with original data structures on DaCapo benchmarks. I

CiteSeerX

Crossref

Comparison of Perfect Hashing Methods

Author: Tao Qizhi
Publication venue: 'Oklahoma State University Library'
Publication date: 01/07/1999
Field of study

This study was conducted to compare two minimal perfect hashing method Chang's method and Jaeschke's method. Since hashing is a widely used technique for store data in symbol table and the data are strings of characters, this study f use on the performance of these methods with the letter-oriented set and gives their run time performance curves. Through the analysis of run time and space complexity, an optimal method is given to make each algorithm performance well

SHAREOK repository

Recommended from our members

Internal hashing for dynamic and static tables

Author: Cook Curtis R.
Lewis Ted G.
Publication venue: Oregon State University. Department of Computer Science
Publication date
Field of study

This tutorial discusses one of the oldest problems in computing: how to search and retrieve keyed information from a list in the least amount of time. Hashing - a technique that mathematically converts a key into a storage address - is one of the best methods of finding and retrieving information associated with a unique identifying key. We briefly survey techniques which have evolved over the past 25 years and then introduce more recent research results for extremely compact and fast methods based on perfect and minimal perfect hashing. Perfect and minimal perfect hashing is useful for rapid lookup in a static table such as keywords in a compiler, spelling checkers, and database management systems. The results presented here show techniques for constructing long lists which can be searched in one memory reference.KEYWORDS AND PHRASES: Key-to-address transformation, hash coding, hash table, scatter table, bucket hashing, perfect hashing, minimal perfect hashin

ScholarsArchive@OSU

Finding minimal perfect hash functions

Author
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/01/1986
Field of study

Crossref

Finding Minimal Perfect Hash Functions

Author: Haggard Gary
Karplus Kevin
Publication venue: 'SAGE Publications'
Publication date: 01/09/1984
Field of study

A heuristic is given for finding minimal perfect hash functions without extensive searching. The procedure is to construct a set of graph (or hypergraph) models for the dictionary, then choose one of the models for use in constructing the minimal perfect hashing function. The construction of this function relies on a backtracking algorithm for numbering the vertices of the graph. Careful selection of the graph model limits the time spent searching. Good results have been obtained for dictionaries of up to 181 words. Using the same techniques, non-minimal perfect hash functions have been found for sets of up to 667 words

eCommons@Cornell

Finding minimal perfect hash functions

Author: Gary Haggard
Kevin Karplus
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date
Field of study

Crossref