Search CORE

7,249 research outputs found

TID Hash Joins

Author: Marek Robert
Rahm Erhard
Publication venue
Publication date: 04/02/2019
Field of study

TID hash joins are a simple and memory-efficient method for processing large join queries. They are based on standard hash join algorithms but only store TID/key pairs in the hash table instead of entire tuples. This typically reduces memory requirements by more than an order of magnitude bringing substantial benefits. In particular, performance for joins on Giga-Byte relations can substantially be improved by reducing the amount of disk I/O to a large extent. Furthermore efficient processing of mixed multi-user workloads consisting of both join queries and OLTP transactions is supported. We present a detailed simulation study to analyze the performance of TID hash joins. In particular, we identify the conditions under which TID hash joins are most beneficial. Furthermore, we compare TID hash join with adaptive hash join algorithms that have been proposed to deal with mixed workloads

Qucosa - Publikationsserver der Universität Leipzig

Forecasting the cost of processing multi-join queries via hashing for main-memory databases (Extended version)

Author: Ailamaki A.
Boncz P. A.
Boncz P. A.
Chen M.-S.
Chen M.-S.
DeWitt D. J.
Lang H.
Li Y.
Liu B.
Lohman G. M.
Lu H.
Manegold S.
Ono K.
Schneider D. A.
Shatdal A.
Stillger M.
Zhang N.
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 21/07/2015
Field of study

Database management systems (DBMSs) carefully optimize complex multi-join queries to avoid expensive disk I/O. As servers today feature tens or hundreds of gigabytes of RAM, a significant fraction of many analytic databases becomes memory-resident. Even after careful tuning for an in-memory environment, a linear disk I/O model such as the one implemented in PostgreSQL may make query response time predictions that are up to 2X slower than the optimal multi-join query plan over memory-resident data. This paper introduces a memory I/O cost model to identify good evaluation strategies for complex query plans with multiple hash-based equi-joins over memory-resident data. The proposed cost model is carefully validated for accuracy using three different systems, including an Amazon EC2 instance, to control for hardware-specific differences. Prior work in parallel query evaluation has advocated right-deep and bushy trees for multi-join queries due to their greater parallelization and pipelining potential. A surprising finding is that the conventional wisdom from shared-nothing disk-based systems does not directly apply to the modern shared-everything memory hierarchy. As corroborated by our model, the performance gap between the optimal left-deep and right-deep query plan can grow to about 10X as the number of joins in the query increases.Comment: 15 pages, 8 figures, extended version of the paper to appear in SoCC'1

arXiv.org e-Print Archive

Crossref

Massively Parallel Sort-Merge Joins in Main Memory Multi-Core Database Systems

Author: Albutiu Martina-Cezara
Kemper Alfons
Neumann Thomas
Publication venue
Publication date: 01/01/2012
Field of study

Two emerging hardware trends will dominate the database system technology in the near future: increasing main memory capacities of several TB per server and massively parallel multi-core processing. Many algorithmic and control techniques in current database technology were devised for disk-based systems where I/O dominated the performance. In this work we take a new look at the well-known sort-merge join which, so far, has not been in the focus of research in scalable massively parallel multi-core data processing as it was deemed inferior to hash joins. We devise a suite of new massively parallel sort-merge (MPSM) join algorithms that are based on partial partition-based sorting. Contrary to classical sort-merge joins, our MPSM algorithms do not rely on a hard to parallelize final merge step to create one complete sort order. Rather they work on the independently created runs in parallel. This way our MPSM algorithms are NUMA-affine as all the sorting is carried out on local memory partitions. An extensive experimental evaluation on a modern 32-core machine with one TB of main memory proves the competitive performance of MPSM on large main memory databases with billions of objects. It scales (almost) linearly in the number of employed cores and clearly outperforms competing hash join proposals - in particular it outperforms the "cutting-edge" Vectorwise parallel query engine by a factor of four.Comment: VLDB201

arXiv.org e-Print Archive

CiteSeerX