1,986 research outputs found
Boosting Multi-Core Reachability Performance with Shared Hash Tables
This paper focuses on data structures for multi-core reachability, which is a
key component in model checking algorithms and other verification methods. A
cornerstone of an efficient solution is the storage of visited states. In
related work, static partitioning of the state space was combined with
thread-local storage and resulted in reasonable speedups, but left open whether
improvements are possible. In this paper, we present a scaling solution for
shared state storage which is based on a lockless hash table implementation.
The solution is specifically designed for the cache architecture of modern
CPUs. Because model checking algorithms impose loose requirements on the hash
table operations, their design can be streamlined substantially compared to
related work on lockless hash tables. Still, an implementation of the hash
table presented here has dozens of sensitive performance parameters (bucket
size, cache line size, data layout, probing sequence, etc.). We analyzed their
impact and compared the resulting speedups with related tools. Our
implementation outperforms two state-of-the-art multi-core model checkers (SPIN
and DiVinE) by a substantial margin, while placing fewer constraints on the
load balancing and search algorithms.Comment: preliminary repor
Forecasting the cost of processing multi-join queries via hashing for main-memory databases (Extended version)
Database management systems (DBMSs) carefully optimize complex multi-join
queries to avoid expensive disk I/O. As servers today feature tens or hundreds
of gigabytes of RAM, a significant fraction of many analytic databases becomes
memory-resident. Even after careful tuning for an in-memory environment, a
linear disk I/O model such as the one implemented in PostgreSQL may make query
response time predictions that are up to 2X slower than the optimal multi-join
query plan over memory-resident data. This paper introduces a memory I/O cost
model to identify good evaluation strategies for complex query plans with
multiple hash-based equi-joins over memory-resident data. The proposed cost
model is carefully validated for accuracy using three different systems,
including an Amazon EC2 instance, to control for hardware-specific differences.
Prior work in parallel query evaluation has advocated right-deep and bushy
trees for multi-join queries due to their greater parallelization and
pipelining potential. A surprising finding is that the conventional wisdom from
shared-nothing disk-based systems does not directly apply to the modern
shared-everything memory hierarchy. As corroborated by our model, the
performance gap between the optimal left-deep and right-deep query plan can
grow to about 10X as the number of joins in the query increases.Comment: 15 pages, 8 figures, extended version of the paper to appear in
SoCC'1
An Efficient Vectorized Hash Table for Batch Computations
In recent years, the increasing demand for high-performance analytics on big data has led the research on batch hash tables. It is shown that this type of hash table can benefit from the cache locality and multi-threading more than ordinary hash tables. Moreover, the batch design for hash tables is amenable to using advanced features of modern processors such as prefetching and SIMD vectorization. While state-of-the-art research and open-source projects on batch hash tables made efforts to propose improved designs by better usage of mentioned hardware features, their approaches still do not fully exploit the existing opportunities for performance improvements. Furthermore, there is a gap for a high-level batch API of such hash tables for wider adoption of these high-performance data structures. In this paper, we present Vec-HT, a parallel, SIMD-vectorized, and prefetching-enabled hash table for fast batch processing. To allow developers to fully take advantage of its performance, we recommend a high-level batch API design. Our experimental results show the superiority and competitiveness of this approach in comparison with the alternative implementations and state-of-the-art for the data-intensive workloads of relational join processing, set operations, and sparse vector processing
Locality-Adaptive Parallel Hash Joins Using Hardware Transactional Memory
Previous work [1] has claimed that the best performing implementation of in-memory hash joins is based on (radix-)partitioning of the build-side input. Indeed, despite the overhead of partitioning, the benefits from increased cache-locality and synchronization free parallelism in the build-phase outweigh the costs when the input data is randomly ordered. However, many datasets already exhibit significant spatial locality (i.e., non-randomness) due to the way data items enter the database: through periodic ETL or trickle loaded in the form of transactions. In such cases, the first benefit of partitioning — increased locality — is largely irrelevant. In this paper, we demonstrate how hardware transactional memory (HTM) can render the other benefit, freedom from synchronization, irrelevant as well. Specifically, using careful analysis and engineering, we develop an adaptive hash join implementation that outperforms parallel radix-partitioned hash joins as well as sort-merge joins on data with high spatial locality. In addition, we show how, through lightweight (less than 1% overhead) runtime monitoring of the transaction abort rate, our implementation can detect inputs with low spatial locality and dynamically fall back to radix-partitioning of the build-side input. The result is a hash join implementation that is more than 3 times faster than the state-of-the-art on high-locality data and never more than 1% slower
bloomRF: On Performing Range-Queries in Bloom-Filters with Piecewise-Monotone Hash Functions and Prefix Hashing
We introduce bloomRF as a unified method for approximate membership testing
that supports both point- and range-queries. As a first core idea, bloomRF
introduces novel prefix hashing to efficiently encode range information in the
hash-code of the key itself. As a second key concept, bloomRF proposes novel
piecewise-monotone hash-functions that preserve local order and support fast
range-lookups with fewer memory accesses. bloomRF has near-optimal space
complexity and constant query complexity. Although, bloomRF is designed for
integer domains, it supports floating-points, and can serve as a
multi-attribute filter. The evaluation in RocksDB and in a standalone library
shows that it is more efficient and outperforms existing point-range-filters by
up to 4x across a range of settings and distributions, while keeping the
false-positive rate low.Comment: Extended version. Original accepted at EDBT 202
- …