Search CORE

1,041 research outputs found

An Experimental Study Into the Effect of Varying the Join Selectivity Factor on the Performance of Join Methods in Relational Databases

Author: Mallet Ada
Publication venue: Edith Cowan University, Research Online, Perth, Western Australia
Publication date: 01/01/1998
Field of study

Relational database systems use join queries to retrieve data from two relations. Several join methods can be used to execute these queries. This study investigated the effect of varying join selectivity factors on the performance of the join methods. Experiments using the ORACLE environment were set up to measure the performance of three join methods: nested loop join, sort merge join and hash join. The performance was measured in terms of total elapsed time, CPU time and the number of I/O reads. The study found that the hash join performs better than the nested loop and the sort merge under all varying conditions. The nested loop competes with the hash join at low join selectivity factor. The results also showed that the sort merge join method performs better than the nested loop when a predicate is applied to the inner table

Research Online @ ECU

A Balanced Solution for the Partition-based Spatial Merge join in MapReduce

Author: Belussi Alberto
Migliorini Sara
Publication venue: CEUR
Publication date: 01/01/2020
Field of study

Several MapReduce frameworks have been developed in recent years in order to cope with the need to process an increasing amount of data. Moreover, some extensions of them have been proposed to deal with particular kind of information, like the spatial one. In this paper we will refer to SpatialHadoop, a spatial extension of Apache Hadoop which provides a rich set of spatial data types and operations. In the geo-spatial domain, spatial join is considered a fundamental operation for performing data analysis. However, the join operation is generally classified as a critical task to be performed in MapReduce, since it requires to process two datasets at time. Several different solutions have been proposed in literature for efficiently performing a spatial join which may or may not require the presence of a spatial index computed on both datasets or only one of them. As already discussed in literature, the efficiency of such operation depends on the ability to both prune unnecessary data as soon as possible and to provide a balanced amount of work to be done by each parallelly executed task. In this paper,we take a step forward in this direction by proposing an evolution of the Partition-based Spatial Merge Join algorithm which tries to completely exploit the benefit of the parallelism induced by the MapReduce framework. In particular, we concentrate on the partition phase which has to produce filtered balanced and meaningful subdivisions of the original datasets

Catalogo dei prodotti della ricerca

On Parallel Join Processing in Object-Relational Database Systems

Author: Märtens Holger
Rahm Erhard
Publication venue
Publication date: 06/11/2018
Field of study

So far only few performance studies on parallel object-relational database systems are available. In particular, the relative performance of relational vs. reference-based join processing in a parallel environment has not been investigated sufficiently. We present a performance study based on the BUCKY benchmark to compare parallel join processing using reference attributes with relational hash- and merge-join algorithms. In addition, we propose a data allocation scheme especially suited for object hierarchies and set-valued attributes

Qucosa - Publikationsserver der Universität Leipzig

Neo: A Learned Query Optimizer

Author: Alizadeh Mohammad
Kraska Tim
Mao Hongzi
Marcus Ryan
Negi Parimarjan
Papaemmanouil Olga
Tatbul Nesime
Zhang Chi
Publication venue: 'VLDB Endowment'
Publication date: 07/04/2019
Field of study

Query optimization is one of the most challenging problems in database systems. Despite the progress made over the past decades, query optimizers remain extremely complex components that require a great deal of hand-tuning for specific workloads and datasets. Motivated by this shortcoming and inspired by recent advances in applying machine learning to data management challenges, we introduce Neo (Neural Optimizer), a novel learning-based query optimizer that relies on deep neural networks to generate query executions plans. Neo bootstraps its query optimization model from existing optimizers and continues to learn from incoming queries, building upon its successes and learning from its failures. Furthermore, Neo naturally adapts to underlying data patterns and is robust to estimation errors. Experimental results demonstrate that Neo, even when bootstrapped from a simple optimizer like PostgreSQL, can learn a model that offers similar performance to state-of-the-art commercial optimizers, and in some cases even surpass them

arXiv.org e-Print Archive

DSpace@MIT

Massively Parallel Sort-Merge Joins in Main Memory Multi-Core Database Systems

Author: Albutiu Martina-Cezara
Kemper Alfons
Neumann Thomas
Publication venue
Publication date: 01/01/2012
Field of study

Two emerging hardware trends will dominate the database system technology in the near future: increasing main memory capacities of several TB per server and massively parallel multi-core processing. Many algorithmic and control techniques in current database technology were devised for disk-based systems where I/O dominated the performance. In this work we take a new look at the well-known sort-merge join which, so far, has not been in the focus of research in scalable massively parallel multi-core data processing as it was deemed inferior to hash joins. We devise a suite of new massively parallel sort-merge (MPSM) join algorithms that are based on partial partition-based sorting. Contrary to classical sort-merge joins, our MPSM algorithms do not rely on a hard to parallelize final merge step to create one complete sort order. Rather they work on the independently created runs in parallel. This way our MPSM algorithms are NUMA-affine as all the sorting is carried out on local memory partitions. An extensive experimental evaluation on a modern 32-core machine with one TB of main memory proves the competitive performance of MPSM on large main memory databases with billions of objects. It scales (almost) linearly in the number of employed cores and clearly outperforms competing hash join proposals - in particular it outperforms the "cutting-edge" Vectorwise parallel query engine by a factor of four.Comment: VLDB201

arXiv.org e-Print Archive

CiteSeerX