1,986 research outputs found

    Boosting Multi-Core Reachability Performance with Shared Hash Tables

    Get PDF
    This paper focuses on data structures for multi-core reachability, which is a key component in model checking algorithms and other verification methods. A cornerstone of an efficient solution is the storage of visited states. In related work, static partitioning of the state space was combined with thread-local storage and resulted in reasonable speedups, but left open whether improvements are possible. In this paper, we present a scaling solution for shared state storage which is based on a lockless hash table implementation. The solution is specifically designed for the cache architecture of modern CPUs. Because model checking algorithms impose loose requirements on the hash table operations, their design can be streamlined substantially compared to related work on lockless hash tables. Still, an implementation of the hash table presented here has dozens of sensitive performance parameters (bucket size, cache line size, data layout, probing sequence, etc.). We analyzed their impact and compared the resulting speedups with related tools. Our implementation outperforms two state-of-the-art multi-core model checkers (SPIN and DiVinE) by a substantial margin, while placing fewer constraints on the load balancing and search algorithms.Comment: preliminary repor

    Forecasting the cost of processing multi-join queries via hashing for main-memory databases (Extended version)

    Full text link
    Database management systems (DBMSs) carefully optimize complex multi-join queries to avoid expensive disk I/O. As servers today feature tens or hundreds of gigabytes of RAM, a significant fraction of many analytic databases becomes memory-resident. Even after careful tuning for an in-memory environment, a linear disk I/O model such as the one implemented in PostgreSQL may make query response time predictions that are up to 2X slower than the optimal multi-join query plan over memory-resident data. This paper introduces a memory I/O cost model to identify good evaluation strategies for complex query plans with multiple hash-based equi-joins over memory-resident data. The proposed cost model is carefully validated for accuracy using three different systems, including an Amazon EC2 instance, to control for hardware-specific differences. Prior work in parallel query evaluation has advocated right-deep and bushy trees for multi-join queries due to their greater parallelization and pipelining potential. A surprising finding is that the conventional wisdom from shared-nothing disk-based systems does not directly apply to the modern shared-everything memory hierarchy. As corroborated by our model, the performance gap between the optimal left-deep and right-deep query plan can grow to about 10X as the number of joins in the query increases.Comment: 15 pages, 8 figures, extended version of the paper to appear in SoCC'1

    An Efficient Vectorized Hash Table for Batch Computations

    Get PDF
    In recent years, the increasing demand for high-performance analytics on big data has led the research on batch hash tables. It is shown that this type of hash table can benefit from the cache locality and multi-threading more than ordinary hash tables. Moreover, the batch design for hash tables is amenable to using advanced features of modern processors such as prefetching and SIMD vectorization. While state-of-the-art research and open-source projects on batch hash tables made efforts to propose improved designs by better usage of mentioned hardware features, their approaches still do not fully exploit the existing opportunities for performance improvements. Furthermore, there is a gap for a high-level batch API of such hash tables for wider adoption of these high-performance data structures. In this paper, we present Vec-HT, a parallel, SIMD-vectorized, and prefetching-enabled hash table for fast batch processing. To allow developers to fully take advantage of its performance, we recommend a high-level batch API design. Our experimental results show the superiority and competitiveness of this approach in comparison with the alternative implementations and state-of-the-art for the data-intensive workloads of relational join processing, set operations, and sparse vector processing

    Locality-Adaptive Parallel Hash Joins Using Hardware Transactional Memory

    Get PDF
    Previous work [1] has claimed that the best performing implementation of in-memory hash joins is based on (radix-)partitioning of the build-side input. Indeed, despite the overhead of partitioning, the benefits from increased cache-locality and synchronization free parallelism in the build-phase outweigh the costs when the input data is randomly ordered. However, many datasets already exhibit significant spatial locality (i.e., non-randomness) due to the way data items enter the database: through periodic ETL or trickle loaded in the form of transactions. In such cases, the first benefit of partitioning — increased locality — is largely irrelevant. In this paper, we demonstrate how hardware transactional memory (HTM) can render the other benefit, freedom from synchronization, irrelevant as well. Specifically, using careful analysis and engineering, we develop an adaptive hash join implementation that outperforms parallel radix-partitioned hash joins as well as sort-merge joins on data with high spatial locality. In addition, we show how, through lightweight (less than 1% overhead) runtime monitoring of the transaction abort rate, our implementation can detect inputs with low spatial locality and dynamically fall back to radix-partitioning of the build-side input. The result is a hash join implementation that is more than 3 times faster than the state-of-the-art on high-locality data and never more than 1% slower

    bloomRF: On Performing Range-Queries in Bloom-Filters with Piecewise-Monotone Hash Functions and Prefix Hashing

    Full text link
    We introduce bloomRF as a unified method for approximate membership testing that supports both point- and range-queries. As a first core idea, bloomRF introduces novel prefix hashing to efficiently encode range information in the hash-code of the key itself. As a second key concept, bloomRF proposes novel piecewise-monotone hash-functions that preserve local order and support fast range-lookups with fewer memory accesses. bloomRF has near-optimal space complexity and constant query complexity. Although, bloomRF is designed for integer domains, it supports floating-points, and can serve as a multi-attribute filter. The evaluation in RocksDB and in a standalone library shows that it is more efficient and outperforms existing point-range-filters by up to 4x across a range of settings and distributions, while keeping the false-positive rate low.Comment: Extended version. Original accepted at EDBT 202
    • …
    corecore