205 research outputs found
Analysing the Performance of GPU Hash Tables for State Space Exploration
In the past few years, General Purpose Graphics Processors (GPUs) have been
used to significantly speed up numerous applications. One of the areas in which
GPUs have recently led to a significant speed-up is model checking. In model
checking, state spaces, i.e., large directed graphs, are explored to verify
whether models satisfy desirable properties. GPUexplore is a GPU-based model
checker that uses a hash table to efficiently keep track of already explored
states. As a large number of states is discovered and stored during such an
exploration, the hash table should be able to quickly handle many inserts and
queries concurrently. In this paper, we experimentally compare two different
hash tables optimised for the GPU, one being the GPUexplore hash table, and the
other using Cuckoo hashing. We compare the performance of both hash tables
using random and non-random data obtained from model checking experiments, to
analyse the applicability of the two hash tables for state space exploration.
We conclude that Cuckoo hashing is three times faster than GPUexplore hashing
for random data, and that Cuckoo hashing is five to nine times faster for
non-random data. This suggests great potential to further speed up GPUexplore
in the near future.Comment: In Proceedings GaM 2017, arXiv:1712.0834
Boosting Multi-Core Reachability Performance with Shared Hash Tables
This paper focuses on data structures for multi-core reachability, which is a
key component in model checking algorithms and other verification methods. A
cornerstone of an efficient solution is the storage of visited states. In
related work, static partitioning of the state space was combined with
thread-local storage and resulted in reasonable speedups, but left open whether
improvements are possible. In this paper, we present a scaling solution for
shared state storage which is based on a lockless hash table implementation.
The solution is specifically designed for the cache architecture of modern
CPUs. Because model checking algorithms impose loose requirements on the hash
table operations, their design can be streamlined substantially compared to
related work on lockless hash tables. Still, an implementation of the hash
table presented here has dozens of sensitive performance parameters (bucket
size, cache line size, data layout, probing sequence, etc.). We analyzed their
impact and compared the resulting speedups with related tools. Our
implementation outperforms two state-of-the-art multi-core model checkers (SPIN
and DiVinE) by a substantial margin, while placing fewer constraints on the
load balancing and search algorithms.Comment: preliminary repor
Scalable Hash Tables
The term scalability with regards to this dissertation has two meanings: It means
taking the best possible advantage of the provided resources (both computational
and memory resources) and it also means scaling data structures in the literal sense,
i.e., growing the capacity, by “rescaling” the table.
Scaling well to computational resources implies constructing the fastest best per-
forming algorithms and data structures. On today’s many-core machines the best
performance is immediately associated with parallelism. Since CPU frequencies
have stopped growing about 10-15 years ago, parallelism is the only way to take ad-
vantage of growing computational resources. But for data structures in general and
hash tables in particular performance is not only linked to faster computations. The
most execution time is actually spent waiting for memory. Thus optimizing data
structures to reduce the amount of memory accesses or to take better advantage of
the memory hierarchy especially through predictable access patterns and prefetch-
ing is just as important.
In terms of scaling the size of hash tables we have identified three domains where
scaling hash-based data structures have been lacking previously, i.e., space effi-
cient growing, concurrent hash tables, and Approximate Membership Query data
structures (AMQ-filter). Throughout this dissertation, we describe the problems
in these areas and develop efficient solutions. We highlight three different libraries
that we have developed over the course of this dissertation, each containing mul-
tiple implementations that have shown throughout our testing to be among the
best implementations in their respective domains. In this composition they offer
a comprehensive toolbox that can be used to solve many kinds of hashing related
problems or to develop individual solutions for further ones.
DySECT is a library for space efficient hash tables specifically growing space effi-
cient hash tables that scale with their input size. It contains the namesake DySECT
data structure in addition to a number of different probing and cuckoo based im-
plementations. Growt is a library for highly efficient concurrent hash tables. It
contains a very fast base table and a number of extensions to adapt this table to
match any purpose. All extension can be combined to create a variety of different
interfaces. In our extensive experimental evaluation, each adaptation has shown
to be among the best hash tables for their specific purpose. Lpqfilter is a library
for concurrent approximate membership query (AMQ) data structures. It contains
some original data structures, like the linear probing quotient filter, as well as some
novel approaches to dynamically sized quotient filters
LightBox: Full-stack Protected Stateful Middlebox at Lightning Speed
Running off-site software middleboxes at third-party service providers has
been a popular practice. However, routing large volumes of raw traffic, which
may carry sensitive information, to a remote site for processing raises severe
security concerns. Prior solutions often abstract away important factors
pertinent to real-world deployment. In particular, they overlook the
significance of metadata protection and stateful processing. Unprotected
traffic metadata like low-level headers, size and count, can be exploited to
learn supposedly encrypted application contents. Meanwhile, tracking the states
of 100,000s of flows concurrently is often indispensable in production-level
middleboxes deployed at real networks.
We present LightBox, the first system that can drive off-site middleboxes at
near-native speed with stateful processing and the most comprehensive
protection to date. Built upon commodity trusted hardware, Intel SGX, LightBox
is the product of our systematic investigation of how to overcome the inherent
limitations of secure enclaves using domain knowledge and customization. First,
we introduce an elegant virtual network interface that allows convenient access
to fully protected packets at line rate without leaving the enclave, as if from
the trusted source network. Second, we provide complete flow state management
for efficient stateful processing, by tailoring a set of data structures and
algorithms optimized for the highly constrained enclave space. Extensive
evaluations demonstrate that LightBox, with all security benefits, can achieve
10Gbps packet I/O, and that with case studies on three stateful middleboxes, it
can operate at near-native speed.Comment: Accepted at ACM CCS 201
Dynamic Space Efficient Hashing
We consider space efficient hash tables that can grow and shrink dynamically and are always highly space efficient, i.e., their space consumption is always close to the lower bound even while growing and when taking into account storage that is only needed temporarily. None of the traditionally used hash tables have this property. We show how known approaches like linear probing and bucket cuckoo hashing can be adapted to this scenario by subdividing them into many subtables or using virtual memory overcommitting. However, these rather straightforward solutions suffer from slow amortized insertion times due to frequent reallocation in small increments.
Our main result is DySECT (Dynamic Space Efficient Cuckoo Table) which avoids these problems. DySECT consists of many subtables which grow by doubling their size. The resulting inhomogeneity in subtable sizes is equalized by the flexibility available in bucket cuckoo hashing where each element can go to several buckets each of which containing several cells. Experiments indicate that DySECT works well with load factors up to 98%. With up to 2.7 times better performance than the next best solution
Fast Gapped k-mer Counting with Subdivided Multi-Way Bucketed Cuckoo Hash Tables
Motivation. In biological sequence analysis, alignment-free (also known as k-mer-based) methods are increasingly replacing mapping- and alignment-based methods for various applications. A basic step of such methods consists of building a table of all k-mers of a given set of sequences (a reference genome or a dataset of sequenced reads) and their counts. Over the past years, efficient methods and tools for k-mer counting have been developed. In a different line of work, the use of gapped k-mers has been shown to offer advantages over the use of the standard contiguous k-mers. However, no tool seems to be available that is able to count gapped k-mers with the same efficiency as contiguous k-mers. One reason is that the most efficient k-mer counters use minimizers (of a length m < k) to group k-mers into buckets, such that many consecutive k-mers are classified into the same bucket. This approach leads to cache-friendly (and hence extremely fast) algorithms, but the approach does not transfer easily to gapped k-mers. Consequently, the existing efficient k-mer counters cannot be trivially modified to count gapped k-mers with the same efficiency.
Results. We present a different approach that is equally applicable to contiguous k-mers and gapped k-mers. We use multi-way bucketed Cuckoo hash tables to efficiently store (gapped) k-mers and their counts. We also describe a method to parallelize counting over multiple threads without using locks: We subdivide the hash table into independent subtables, and use a producer-consumer model, such that each thread serves one subtable. This requires designing Cuckoo hash functions with the property that all alternative locations for each k-mer are located in the same subtable. Compared to some of the fastest contiguous k-mer counters, our approach is of comparable speed, or even faster, on large datasets, and it is the only one that supports gapped k-mers
- …