    TID Hash Joins

    TID hash joins are a simple and memory-efficient method for processing large join queries. They are based on standard hash join algorithms but only store TID/key pairs in the hash table instead of entire tuples. This typically reduces memory requirements by more than an order of magnitude bringing substantial benefits. In particular, performance for joins on Giga-Byte relations can substantially be improved by reducing the amount of disk I/O to a large extent. Furthermore efficient processing of mixed multi-user workloads consisting of both join queries and OLTP transactions is supported. We present a detailed simulation study to analyze the performance of TID hash joins. In particular, we identify the conditions under which TID hash joins are most beneficial. Furthermore, we compare TID hash join with adaptive hash join algorithms that have been proposed to deal with mixed workloads

    An Experimental Study Into the Effect of Varying the Join Selectivity Factor on the Performance of Join Methods in Relational Databases

    Relational database systems use join queries to retrieve data from two relations. Several join methods can be used to execute these queries. This study investigated the effect of varying join selectivity factors on the performance of the join methods. Experiments using the ORACLE environment were set up to measure the performance of three join methods: nested loop join, sort merge join and hash join. The performance was measured in terms of total elapsed time, CPU time and the number of I/O reads. The study found that the hash join performs better than the nested loop and the sort merge under all varying conditions. The nested loop competes with the hash join at low join selectivity factor. The results also showed that the sort merge join method performs better than the nested loop when a predicate is applied to the inner table

    Operadores de junção baseados em mecanismos de hash para o processamento de consultas em bancos de dados.

    Join algorithms constitute a key element for processing queries on databases. In this paper, the evolution of hash-based join algorithms is investigated. Conventional algorithms such as Simple Hash Join, Grace Hash Join and Hybrid Hash Join which were designed for conventional databases architectures are described and analyzed. Furthermore, algorithms such as Symmetric Hash Join, MobiJoin, Hash-Merge Join and MJoin for implementing the join operator in environments with more complex query processing requirements (e.g. mobile computing environment) are presented and analyzed as well.Os algoritmos de junção constituem um elemento chave para o desempenho do processamento de consultas. Com a evolução dos ambientes de execução de consultas tornou-se necessária o desenvolvimento de algoritmos mais eficientes para implementar o operador de junção. Neste trabalho é realizado um estudo sobre a evolução dos algoritmos de junção baseados na técnica de hashing. Serão analisadas estratégias convencionais como o Simple Hash Join, o Grace Hash Join e o Hybrid Hash Join, projetadas para arquiteturas de bancos de dados convencionais, até aquelas capazes de oferecer suporte a ambientes com processamentos de consultas mais complexos, como os de computação móvel. Os algoritmos hash capazes de atender a algumas das necessidades destes novos ambientes incluem o Symmetric Hash Join, o MobiJoin, o Hash-Merge Join e o MJoin

    Relational Cache Aware Join

    Import 22/07/2015Tato práce se zabývá teoretickým popisem procesorové cache počítače, a dále pak jejím využitím pro implementaci optimalizovaných algoritmů. Konkrétně se zaměřuje na algoritmy relačního spojení, jako jsou například nested loop a hash-join. Hlavní část práce bude pojednávat o algoritmech GRACE hash-join a radix-cluster hash-join, které jsem implementoval v jazyce C++, s důrazem na využití procesorové cache. Také zde budou zmíněny nástroje, které mohou pomoci při optimalizaci algoritmů pro cache. V závěru pak budou tyto algoritmy mezi sebou porovnány, a také bude provedeno srovnání s neoptimalizovaným algoritmem hash-join.This thesis deals with theoretical description of processor’s cache, as well as it’s exploitation in implementing optimalized algorithms. It will focus on algorithms for relational joins. These algorithms are, for example, nested loop and hash-join. The core part of the thesis will explain GRACE hash-join algorithm and radix-cluster hash-join algorithm, which were implemented in C++ with emphasis on processor’s cache exploitation. Furthermore, the tools that can be used during cache aware optimalizations of algorithms will be mentioned. In the conclusion, these algorithms will be compared to each other, as well as with unoptimized hash-join algorithm.460 - Katedra informatikyvýborn

    Locality-Adaptive Parallel Hash Joins Using Hardware Transactional Memory

    Previous work [1] has claimed that the best performing implementation of in-memory hash joins is based on (radix-)partitioning of the build-side input. Indeed, despite the overhead of partitioning, the benefits from increased cache-locality and synchronization free parallelism in the build-phase outweigh the costs when the input data is randomly ordered. However, many datasets already exhibit significant spatial locality (i.e., non-randomness) due to the way data items enter the database: through periodic ETL or trickle loaded in the form of transactions. In such cases, the first benefit of partitioning — increased locality — is largely irrelevant. In this paper, we demonstrate how hardware transactional memory (HTM) can render the other benefit, freedom from synchronization, irrelevant as well. Specifically, using careful analysis and engineering, we develop an adaptive hash join implementation that outperforms parallel radix-partitioned hash joins as well as sort-merge joins on data with high spatial locality. In addition, we show how, through lightweight (less than 1% overhead) runtime monitoring of the transaction abort rate, our implementation can detect inputs with low spatial locality and dynamically fall back to radix-partitioning of the build-side input. The result is a hash join implementation that is more than 3 times faster than the state-of-the-art on high-locality data and never more than 1% slower

    HATCH: Hash Table Caching in Hardware for Efficient Relational Join on FPGA

    In this paper we present HATCH, a novel hash join engine. We follow a new design point which enables us to effectively cache the hash table entries in fast BRAM resources, meanwhile supporting collision resolution in hardware. HATCH enables us to have the best of two worlds: (i) to use the full capacity of the DDR memory to store complete hash tables, and (ii) by employing a cache, to exploit the high access speed of BRAMs. We demonstrate the usefulness of our approach by running hash join operations from 5 TPCH benchmark queries and report speedups up to 2.8x over a pipeline-optimized baseline.The research leading to these results has received funding from the European Unions Seventh Framework Programme (FP7/2007-2013), for Advanced Analytics for Extremely Large European Databases (AXLE) project under grant agreement number 318633, and from the Ministry of Economy and Competitiveness of Spain under contract number TIN2012-34557.Postprint (author's final draft

    Parallel Evaluation of Multi-join Queries

    A number of execution strategies for parallel evaluation of multi-join queries have been proposed in the literature. In this paper we give a comparative performance evaluation of four execution strategies by implementing all of them on the same parallel database system, PRISMA/DB. Experiments have been done up to 80 processors. These strategies, coming from the literature, are named: Sequential Parallel, Synchronous Execution, Segmented Right-Deep, and Full Parallel. Based on the experiments clear guidelines are given when to use which strategy. This is an extended abstract; the full paper appeared in Proc. ACM SIGMOD'94, Minneapolis, Minnesota, May 24–27, 199

    Perbandingan Pencarian Data Menggunakan Query Hash Join dan Query Nested Join

    Pengaksesan data atau pencarian data dengan menggunakan Query atau Join pada aplikasi yang terhubung dengan sebuah database perlu memperhatikan ketepatgunaan implementasi dari data itu sendiri serta waktu prosesnya. Ada banyak cara yang dapat dilakukan oleh database manajemen sistem dalam memproses dan menghasilkan jawaban sebuah query. Semua cara pada akhirnya akan menghasilkan jawaban (output) yang sama tetapi pasti mempunyai harga yang berbeda-beda, seperti kecepatan waktu untuk merespon data. Beberapa query yang sering digunakan untuk pemrosesan data yaitu Query Hash Join dan Query Nested Join, kedua query memiliki algoritma yang berbeda tapi menghasilkan output yang sama. Dengan menggunakan aplikasi yang dirancang menggunakan Microsoft Visual Studi 2010 dan Microsoft SQL Server 2008 berbasis jaringan untuk melakukan pengujian kedua algoritma atau query dengan parameter running time atau kecepatan waktu merespon data. Pengujian dilakukan dengan jumlah tabel yang dihubungkan dan jumlah baris/record. Hasil dari penelitian adalah kecepatan waktu query dalam merespon data untuk jumlah data yang kecil query hash join lebih baik dibandingkan dengan jumlah data yang besar query nested join