118 research outputs found
GPU LSM: A Dynamic Dictionary Data Structure for the GPU
We develop a dynamic dictionary data structure for the GPU, supporting fast
insertions and deletions, based on the Log Structured Merge tree (LSM). Our
implementation on an NVIDIA K40c GPU has an average update (insertion or
deletion) rate of 225 M elements/s, 13.5x faster than merging items into a
sorted array. The GPU LSM supports the retrieval operations of lookup, count,
and range query operations with an average rate of 75 M, 32 M and 23 M
queries/s respectively. The trade-off for the dynamic updates is that the
sorted array is almost twice as fast on retrievals. We believe that our GPU LSM
is the first dynamic general-purpose dictionary data structure for the GPU.Comment: 11 pages, accepted to appear on the Proceedings of IEEE International
Parallel and Distributed Processing Symposium (IPDPS'18
Optimum Algorithms for a Model of Direct Chaining
Direct chaining is a popular and efficient class of hashing algorithms. In this paper we study
optimum algorithms among direct chaining methods, under the restrictions that the records in the hash table are not moved after they are inserted, that for each chain the relative ordering of the records in the chain does not change after more insertions, and that only one link field is used per table slot. The varied-insertion coalesced hashing method (VICH), which is proposed and analyzed in [CV84], is conjectured to be optimum
among all direct chaining algorithms in this class. We give strong evidence in favor of the conjecture by showing that VICH is optimum under fairly general conditions
Analysis of Early-Insertion Standard Coalesced Hashing
This paper analyzes the early-insertion standard coalesced hashing method (EISCH), which
is a variant of the standard coalesced hashing algorithm (SCH) described in [Knu73], [Vit80] and [Vit82b]. The analysis answers the open problem posed in [Vit80]. The number of probes per successful search in full tables is 5% better with EISCH than with SCH
Meerkat: A framework for Dynamic Graph Algorithms on GPUs
Graph algorithms are challenging to implement due to their varying topology
and irregular access patterns. Real-world graphs are dynamic in nature and
routinely undergo edge and vertex additions, as well as, deletions. Typical
examples of dynamic graphs are social networks, collaboration networks, and
road networks. Applying static algorithms repeatedly on dynamic graphs is
inefficient. Unfortunately, we know little about how to efficiently process
dynamic graphs on massively parallel architectures such as GPUs. Existing
approaches to represent and process dynamic graphs are either not general or
inefficient. In this work, we propose a library-based framework for dynamic
graph algorithms that proposes a GPU-tailored graph representation and exploits
the warp-cooperative execution model. The library, named Meerkat, builds upon a
recently proposed dynamic graph representation on GPUs. This representation
exploits a hashtable-based mechanism to store a vertex's neighborhood. Meerkat
also enables fast iteration through a group of vertices, such as the whole set
of vertices or the neighbors of a vertex. Based on the efficient iterative
patterns encoded in Meerkat, we implement dynamic versions of the popular graph
algorithms such as breadth-first search, single-source shortest paths, triangle
counting, weakly connected components, and PageRank. Compared to the
state-of-the-art dynamic graph analytics framework Hornet, Meerkat is
, , and faster, for query, insert, and
delete operations, respectively. Using a variety of real-world graphs, we
observe that Meerkat significantly improves the efficiency of the underlying
dynamic graph algorithm. Meerkat performs for BFS,
for SSSP, for PageRank, and for WCC, better than
Hornet on average
Data structures for set manipulation- hash table, 1986
The most important issue addressed in this thesis is the efficient implementation of hash table methods. There are credential trade-offs in a desired implement ion. These are discussed in issues such as hash addressing, handling collision, hash table layout., and bucket overflow problems. The criteria of good hash function is providing even distribution. Collision is the major problem in hash table methods. Two major hashtable methods are discussed. Open Addressing Method places the synonymous items somewhere within the table. The Chaining Method, however, chains all synonymies and stores them somewhere outside the table called overflow area. Hash table is widely used by system software as an ideal data structure. Hash Table -applications canbe found in compiler's symbol table, database, directories of file organizations, as well as in problem-solving application programs
Scalable Hash Tables
The term scalability with regards to this dissertation has two meanings: It means
taking the best possible advantage of the provided resources (both computational
and memory resources) and it also means scaling data structures in the literal sense,
i.e., growing the capacity, by “rescaling” the table.
Scaling well to computational resources implies constructing the fastest best per-
forming algorithms and data structures. On today’s many-core machines the best
performance is immediately associated with parallelism. Since CPU frequencies
have stopped growing about 10-15 years ago, parallelism is the only way to take ad-
vantage of growing computational resources. But for data structures in general and
hash tables in particular performance is not only linked to faster computations. The
most execution time is actually spent waiting for memory. Thus optimizing data
structures to reduce the amount of memory accesses or to take better advantage of
the memory hierarchy especially through predictable access patterns and prefetch-
ing is just as important.
In terms of scaling the size of hash tables we have identified three domains where
scaling hash-based data structures have been lacking previously, i.e., space effi-
cient growing, concurrent hash tables, and Approximate Membership Query data
structures (AMQ-filter). Throughout this dissertation, we describe the problems
in these areas and develop efficient solutions. We highlight three different libraries
that we have developed over the course of this dissertation, each containing mul-
tiple implementations that have shown throughout our testing to be among the
best implementations in their respective domains. In this composition they offer
a comprehensive toolbox that can be used to solve many kinds of hashing related
problems or to develop individual solutions for further ones.
DySECT is a library for space efficient hash tables specifically growing space effi-
cient hash tables that scale with their input size. It contains the namesake DySECT
data structure in addition to a number of different probing and cuckoo based im-
plementations. Growt is a library for highly efficient concurrent hash tables. It
contains a very fast base table and a number of extensions to adapt this table to
match any purpose. All extension can be combined to create a variety of different
interfaces. In our extensive experimental evaluation, each adaptation has shown
to be among the best hash tables for their specific purpose. Lpqfilter is a library
for concurrent approximate membership query (AMQ) data structures. It contains
some original data structures, like the linear probing quotient filter, as well as some
novel approaches to dynamically sized quotient filters
Recommended from our members
Internal hashing for dynamic and static tables
This tutorial discusses one of the oldest problems in computing: how to search and retrieve keyed information from a list in the least amount of time. Hashing - a technique that mathematically converts a key into a storage address - is one of the best methods of finding and retrieving information associated with a unique identifying key. We briefly survey techniques which have evolved over the past 25 years and then introduce more recent research results for extremely compact and fast methods based on perÂfect and minimal perfect hashing. Perfect and minimal perfect hashing is useful for rapid lookup in a static table such as keywords in a compiler, spelling checkers, and database management systems. The results presented here show techniques for constructing long lists which can be searched in one memoÂry reference.KEYWORDS AND PHRASES: Key-to-address transformation, hash coding, hash table, scatter table, bucket hashing, perfect hashing, minimal perfect hashin
Simulating Uniform Hashing in Constant Time and Optimal Space
Many algorithms and data structures employing hashing have been analyzed under the uniform hashing assumption, i.e., the assumption that hash functions behave like truly random functions. In this paper it is shown how to implement hash functions that can be evaluated on a RAM in constant time, and behave like truly random functions on any set of n inputs, with high probability. The space needed to represent a function is O(n) words, which is the best possible (and a polynomial improvement compared to previous fast hash functions). As a consequence, a broad class of hashing schemes can be implemented to meet, with high probability, the performance guarantees of their uniform hashing analysis
- …