205 research outputs found

    Analysing the Performance of GPU Hash Tables for State Space Exploration

    Get PDF
    In the past few years, General Purpose Graphics Processors (GPUs) have been used to significantly speed up numerous applications. One of the areas in which GPUs have recently led to a significant speed-up is model checking. In model checking, state spaces, i.e., large directed graphs, are explored to verify whether models satisfy desirable properties. GPUexplore is a GPU-based model checker that uses a hash table to efficiently keep track of already explored states. As a large number of states is discovered and stored during such an exploration, the hash table should be able to quickly handle many inserts and queries concurrently. In this paper, we experimentally compare two different hash tables optimised for the GPU, one being the GPUexplore hash table, and the other using Cuckoo hashing. We compare the performance of both hash tables using random and non-random data obtained from model checking experiments, to analyse the applicability of the two hash tables for state space exploration. We conclude that Cuckoo hashing is three times faster than GPUexplore hashing for random data, and that Cuckoo hashing is five to nine times faster for non-random data. This suggests great potential to further speed up GPUexplore in the near future.Comment: In Proceedings GaM 2017, arXiv:1712.0834

    Boosting Multi-Core Reachability Performance with Shared Hash Tables

    Get PDF
    This paper focuses on data structures for multi-core reachability, which is a key component in model checking algorithms and other verification methods. A cornerstone of an efficient solution is the storage of visited states. In related work, static partitioning of the state space was combined with thread-local storage and resulted in reasonable speedups, but left open whether improvements are possible. In this paper, we present a scaling solution for shared state storage which is based on a lockless hash table implementation. The solution is specifically designed for the cache architecture of modern CPUs. Because model checking algorithms impose loose requirements on the hash table operations, their design can be streamlined substantially compared to related work on lockless hash tables. Still, an implementation of the hash table presented here has dozens of sensitive performance parameters (bucket size, cache line size, data layout, probing sequence, etc.). We analyzed their impact and compared the resulting speedups with related tools. Our implementation outperforms two state-of-the-art multi-core model checkers (SPIN and DiVinE) by a substantial margin, while placing fewer constraints on the load balancing and search algorithms.Comment: preliminary repor

    Scalable Hash Tables

    Get PDF
    The term scalability with regards to this dissertation has two meanings: It means taking the best possible advantage of the provided resources (both computational and memory resources) and it also means scaling data structures in the literal sense, i.e., growing the capacity, by “rescaling” the table. Scaling well to computational resources implies constructing the fastest best per- forming algorithms and data structures. On today’s many-core machines the best performance is immediately associated with parallelism. Since CPU frequencies have stopped growing about 10-15 years ago, parallelism is the only way to take ad- vantage of growing computational resources. But for data structures in general and hash tables in particular performance is not only linked to faster computations. The most execution time is actually spent waiting for memory. Thus optimizing data structures to reduce the amount of memory accesses or to take better advantage of the memory hierarchy especially through predictable access patterns and prefetch- ing is just as important. In terms of scaling the size of hash tables we have identified three domains where scaling hash-based data structures have been lacking previously, i.e., space effi- cient growing, concurrent hash tables, and Approximate Membership Query data structures (AMQ-filter). Throughout this dissertation, we describe the problems in these areas and develop efficient solutions. We highlight three different libraries that we have developed over the course of this dissertation, each containing mul- tiple implementations that have shown throughout our testing to be among the best implementations in their respective domains. In this composition they offer a comprehensive toolbox that can be used to solve many kinds of hashing related problems or to develop individual solutions for further ones. DySECT is a library for space efficient hash tables specifically growing space effi- cient hash tables that scale with their input size. It contains the namesake DySECT data structure in addition to a number of different probing and cuckoo based im- plementations. Growt is a library for highly efficient concurrent hash tables. It contains a very fast base table and a number of extensions to adapt this table to match any purpose. All extension can be combined to create a variety of different interfaces. In our extensive experimental evaluation, each adaptation has shown to be among the best hash tables for their specific purpose. Lpqfilter is a library for concurrent approximate membership query (AMQ) data structures. It contains some original data structures, like the linear probing quotient filter, as well as some novel approaches to dynamically sized quotient filters

    LightBox: Full-stack Protected Stateful Middlebox at Lightning Speed

    Full text link
    Running off-site software middleboxes at third-party service providers has been a popular practice. However, routing large volumes of raw traffic, which may carry sensitive information, to a remote site for processing raises severe security concerns. Prior solutions often abstract away important factors pertinent to real-world deployment. In particular, they overlook the significance of metadata protection and stateful processing. Unprotected traffic metadata like low-level headers, size and count, can be exploited to learn supposedly encrypted application contents. Meanwhile, tracking the states of 100,000s of flows concurrently is often indispensable in production-level middleboxes deployed at real networks. We present LightBox, the first system that can drive off-site middleboxes at near-native speed with stateful processing and the most comprehensive protection to date. Built upon commodity trusted hardware, Intel SGX, LightBox is the product of our systematic investigation of how to overcome the inherent limitations of secure enclaves using domain knowledge and customization. First, we introduce an elegant virtual network interface that allows convenient access to fully protected packets at line rate without leaving the enclave, as if from the trusted source network. Second, we provide complete flow state management for efficient stateful processing, by tailoring a set of data structures and algorithms optimized for the highly constrained enclave space. Extensive evaluations demonstrate that LightBox, with all security benefits, can achieve 10Gbps packet I/O, and that with case studies on three stateful middleboxes, it can operate at near-native speed.Comment: Accepted at ACM CCS 201

    Dynamic Space Efficient Hashing

    Get PDF
    We consider space efficient hash tables that can grow and shrink dynamically and are always highly space efficient, i.e., their space consumption is always close to the lower bound even while growing and when taking into account storage that is only needed temporarily. None of the traditionally used hash tables have this property. We show how known approaches like linear probing and bucket cuckoo hashing can be adapted to this scenario by subdividing them into many subtables or using virtual memory overcommitting. However, these rather straightforward solutions suffer from slow amortized insertion times due to frequent reallocation in small increments. Our main result is DySECT (Dynamic Space Efficient Cuckoo Table) which avoids these problems. DySECT consists of many subtables which grow by doubling their size. The resulting inhomogeneity in subtable sizes is equalized by the flexibility available in bucket cuckoo hashing where each element can go to several buckets each of which containing several cells. Experiments indicate that DySECT works well with load factors up to 98%. With up to 2.7 times better performance than the next best solution

    Fast Gapped k-mer Counting with Subdivided Multi-Way Bucketed Cuckoo Hash Tables

    Get PDF
    Motivation. In biological sequence analysis, alignment-free (also known as k-mer-based) methods are increasingly replacing mapping- and alignment-based methods for various applications. A basic step of such methods consists of building a table of all k-mers of a given set of sequences (a reference genome or a dataset of sequenced reads) and their counts. Over the past years, efficient methods and tools for k-mer counting have been developed. In a different line of work, the use of gapped k-mers has been shown to offer advantages over the use of the standard contiguous k-mers. However, no tool seems to be available that is able to count gapped k-mers with the same efficiency as contiguous k-mers. One reason is that the most efficient k-mer counters use minimizers (of a length m < k) to group k-mers into buckets, such that many consecutive k-mers are classified into the same bucket. This approach leads to cache-friendly (and hence extremely fast) algorithms, but the approach does not transfer easily to gapped k-mers. Consequently, the existing efficient k-mer counters cannot be trivially modified to count gapped k-mers with the same efficiency. Results. We present a different approach that is equally applicable to contiguous k-mers and gapped k-mers. We use multi-way bucketed Cuckoo hash tables to efficiently store (gapped) k-mers and their counts. We also describe a method to parallelize counting over multiple threads without using locks: We subdivide the hash table into independent subtables, and use a producer-consumer model, such that each thread serves one subtable. This requires designing Cuckoo hash functions with the property that all alternative locations for each k-mer are located in the same subtable. Compared to some of the fastest contiguous k-mer counters, our approach is of comparable speed, or even faster, on large datasets, and it is the only one that supports gapped k-mers
    • …
    corecore