Search CORE

13 research outputs found

Robust Query Optimization Methods With Respect to Estimation Errors: A Survey

Author: Hameurlain Abdelkader
Morvan Franck
Yin Shaoyi
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/01/2014
Field of study

International audienceThe quality of a query execution plan chosen by a Cost-Based Optimizer (CBO) depends greatly on the estimation accuracy of input parameter values. Many research results have been produced on improving the estimation accuracy, but they do not work for every situation. Therefore, "robust query optimization" was introduced, in an effort to minimize the sub-optimality risk by accepting the fact that estimates could be inaccurate. In this survey, we aim to provide an overview of robust query optimization methods by classifying them into different categories, explaining the essential ideas, listing their advantages and limitations, and comparing them with multiple criteria

Crossref

Scientific Publications of the University of Toulouse II Le Mirail

Open Archive Toulouse Archive Ouverte

Can Deep Neural Networks Predict Data Correlations from Column Names?

Author: Trummer Immanuel
Publication venue
Publication date: 09/07/2021
Field of study

For humans, it is often possible to predict data correlations from column names. We conduct experiments to find out whether deep neural networks can learn to do the same. If so, e.g., it would open up the possibility of tuning tools that use NLP analysis on schema elements to prioritize their efforts for correlation detection. We analyze correlations for around 120,000 column pairs, taken from around 4,000 data sets. We try to predict correlations, based on column names alone. For predictions, we exploit pre-trained language models, based on the recently proposed Transformer architecture. We consider different types of correlations, multiple prediction methods, and various prediction scenarios. We study the impact of factors such as column name length or the amount of training data on prediction accuracy. Altogether, we find that deep neural networks can predict correlations with a relatively high accuracy in many scenarios (e.g., with an accuracy of 95% for long column names)

arXiv.org e-Print Archive

How Good Are Query Optimizers, Really?

Author: Boncz P.A. (Peter)
Gubichev A. (Andrey)
Kemper A. (Alfons)
Leis V. (Viktor)
Mirchev A. (Atanas)
Neumann T. (Thomas)
Publication venue
Publication date: 01/11/2015
Field of study

Finding a good join order is crucial for query performance. In this paper, we introduce the Join Order Benchmark (JOB) and experimentally revisi

CWI's Institutional Repository

Sampling-Based Query Re-Optimization

Author: Bruno N.
Graefe G.
Ioannidis Y. E.
Poosala V.
Reddy N.
Stillger M.
Publication venue
Publication date: 21/01/2016
Field of study

Despite of decades of work, query optimizers still make mistakes on "difficult" queries because of bad cardinality estimates, often due to the interaction of multiple predicates and correlations in the data. In this paper, we propose a low-cost post-processing step that can take a plan produced by the optimizer, detect when it is likely to have made such a mistake, and take steps to fix it. Specifically, our solution is a sampling-based iterative procedure that requires almost no changes to the original query optimizer or query evaluation mechanism of the system. We show that this indeed imposes low overhead and catches cases where three widely used optimizers (PostgreSQL and two commercial systems) make large errors.Comment: This is the extended version of a paper with the same title and authors that appears in the Proceedings of the ACM SIGMOD International Conference on Management of Data (SIGMOD 2016

arXiv.org e-Print Archive

Crossref

Learning Multi-dimensional Indexes

Author: Alizadeh Mohammad
Ding Jialin
Kraska Tim
Nathan Vikram
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 03/12/2019
Field of study

Scanning and filtering over multi-dimensional tables are key operations in modern analytical database engines. To optimize the performance of these operations, databases often create clustered indexes over a single dimension or multi-dimensional indexes such as R-trees, or use complex sort orders (e.g., Z-ordering). However, these schemes are often hard to tune and their performance is inconsistent across different datasets and queries. In this paper, we introduce Flood, a multi-dimensional in-memory index that automatically adapts itself to a particular dataset and workload by jointly optimizing the index structure and data storage. Flood achieves up to three orders of magnitude faster performance for range scans with predicates than state-of-the-art multi-dimensional indexes or sort orders on real-world datasets and workloads. Our work serves as a building block towards an end-to-end learned database system

arXiv.org e-Print Archive

Crossref

DSpace@MIT