Search CORE

1,154 research outputs found

Sampling-Based Query Re-Optimization

Author: Bruno N.
Graefe G.
Ioannidis Y. E.
Poosala V.
Reddy N.
Stillger M.
Publication venue
Publication date: 21/01/2016
Field of study

Despite of decades of work, query optimizers still make mistakes on "difficult" queries because of bad cardinality estimates, often due to the interaction of multiple predicates and correlations in the data. In this paper, we propose a low-cost post-processing step that can take a plan produced by the optimizer, detect when it is likely to have made such a mistake, and take steps to fix it. Specifically, our solution is a sampling-based iterative procedure that requires almost no changes to the original query optimizer or query evaluation mechanism of the system. We show that this indeed imposes low overhead and catches cases where three widely used optimizers (PostgreSQL and two commercial systems) make large errors.Comment: This is the extended version of a paper with the same title and authors that appears in the Proceedings of the ACM SIGMOD International Conference on Management of Data (SIGMOD 2016

arXiv.org e-Print Archive

Crossref

How Good Are Query Optimizers, Really?

Author: Boncz P.A. (Peter)
Gubichev A. (Andrey)
Kemper A. (Alfons)
Leis V. (Viktor)
Mirchev A. (Atanas)
Neumann T. (Thomas)
Publication venue
Publication date: 01/11/2015
Field of study

Finding a good join order is crucial for query performance. In this paper, we introduce the Join Order Benchmark (JOB) and experimentally revisi

CWI's Institutional Repository

Robust Query Optimization Methods With Respect to Estimation Errors: A Survey

Author: Hameurlain Abdelkader
Morvan Franck
Yin Shaoyi
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/01/2014
Field of study

International audienceThe quality of a query execution plan chosen by a Cost-Based Optimizer (CBO) depends greatly on the estimation accuracy of input parameter values. Many research results have been produced on improving the estimation accuracy, but they do not work for every situation. Therefore, "robust query optimization" was introduced, in an effort to minimize the sub-optimality risk by accepting the fact that estimates could be inaccurate. In this survey, we aim to provide an overview of robust query optimization methods by classifying them into different categories, explaining the essential ideas, listing their advantages and limitations, and comparing them with multiple criteria

Scientific Publications of the University of Toulouse II Le Mirail

Open Archive Toulouse Archive Ouverte

Flow-Loss: Learning Cardinality Estimates That Matter

Author: Alizadeh Mohammad
Kipf Andreas
Kraska Tim
Mao Hongzi
Marcus Ryan
Negi Parimarjan
Tatbul Nesime
Publication venue
Publication date: 13/01/2021
Field of study

Previous approaches to learned cardinality estimation have focused on improving average estimation error, but not all estimates matter equally. Since learned models inevitably make mistakes, the goal should be to improve the estimates that make the biggest difference to an optimizer. We introduce a new loss function, Flow-Loss, that explicitly optimizes for better query plans by approximating the optimizer's cost model and dynamic programming search algorithm with analytical functions. At the heart of Flow-Loss is a reduction of query optimization to a flow routing problem on a certain plan graph in which paths correspond to different query plans. To evaluate our approach, we introduce the Cardinality Estimation Benchmark, which contains the ground truth cardinalities for sub-plans of over 16K queries from 21 templates with up to 15 joins. We show that across different architectures and databases, a model trained with Flow-Loss improves the cost of plans (using the PostgreSQL cost model) and query runtimes despite having worse estimation accuracy than a model trained with Q-Error. When the test set queries closely match the training queries, both models improve performance significantly over PostgreSQL and are close to the optimal performance (using true cardinalities). However, the Q-Error trained model degrades significantly when evaluated on queries that are slightly different (e.g., similar but not identical query templates), while the Flow-Loss trained model generalizes better to such situations. For example, the Flow-Loss model achieves up to 1.5x better runtimes on unseen templates compared to the Q-Error model, despite leveraging the same model architecture and training data

arXiv.org e-Print Archive

DSpace@MIT

Query Optimization by Indexing in the ODRA OODBMS

Author: Adamus Radosław
Chromiak Michał
Kuliberda Kamil
M. Kowalski Tomasz
Subieta Kazimierz
Wiślicki Jacek
Publication venue: 'Uniwersytetu Marii Curie-Sklodowskiej w Lublinie'
Publication date: 04/01/2015
Field of study

We present features and samples of use of the index optimizer module which has been implemented and tested in the ODRA prototype system. The ODRA index implementation is based on linear hashing and works in a scope of a standalone database. The solution is adaptable to distributed environments in order to optimally utilize data grid computational resources. The implementation consists of transparent optimization, automatic index updating and management facilities

University of Maria Curie-Skłodowska (UMCS): Scientific e-Journals / Uniwersytet Marii Curie-Skłodowskiej: e-czasopisma naukowe