Search CORE

280 research outputs found

Estimating Cardinalities with Deep Sketches

Author: Boncz Peter
Kemper Alfons
Kipf Andreas
Kipf Thomas
Leis Viktor
Müller Jonas
Neumann Thomas
Radke Bernhard
Vorona Dimitri
Publication venue
Publication date: 17/04/2019
Field of study

We introduce Deep Sketches, which are compact models of databases that allow us to estimate the result sizes of SQL queries. Deep Sketches are powered by a new deep learning approach to cardinality estimation that can capture correlations between columns, even across tables. Our demonstration allows users to define such sketches on the TPC-H and IMDb datasets, monitor the training process, and run ad-hoc queries against trained sketches. We also estimate query cardinalities with HyPer and PostgreSQL to visualize the gains over traditional cardinality estimators.Comment: To appear in SIGMOD'1

arXiv.org e-Print Archive

Crossref

Learned Cardinalities: Estimating Correlated Joins with Deep Learning

Author: Boncz Peter
Kemper Alfons
Kipf Andreas
Kipf Thomas
Leis Viktor
Radke Bernhard
Publication venue
Publication date: 18/12/2018
Field of study

We describe a new deep learning approach to cardinality estimation. MSCN is a multi-set convolutional network, tailored to representing relational query plans, that employs set semantics to capture query features and true cardinalities. MSCN builds on sampling-based estimation, addressing its weaknesses when no sampled tuples qualify a predicate, and in capturing join-crossing correlations. Our evaluation of MSCN using a real-world dataset shows that deep learning significantly enhances the quality of cardinality estimation, which is the core problem in query optimization.Comment: CIDR 2019. https://github.com/andreaskipf/learnedcardinalitie

arXiv.org e-Print Archive

CWI's Institutional Repository

Flow-Loss: Learning Cardinality Estimates That Matter

Author: Alizadeh Mohammad
Kipf Andreas
Kraska Tim
Mao Hongzi
Marcus Ryan
Negi Parimarjan
Tatbul Nesime
Publication venue
Publication date: 13/01/2021
Field of study

Previous approaches to learned cardinality estimation have focused on improving average estimation error, but not all estimates matter equally. Since learned models inevitably make mistakes, the goal should be to improve the estimates that make the biggest difference to an optimizer. We introduce a new loss function, Flow-Loss, that explicitly optimizes for better query plans by approximating the optimizer's cost model and dynamic programming search algorithm with analytical functions. At the heart of Flow-Loss is a reduction of query optimization to a flow routing problem on a certain plan graph in which paths correspond to different query plans. To evaluate our approach, we introduce the Cardinality Estimation Benchmark, which contains the ground truth cardinalities for sub-plans of over 16K queries from 21 templates with up to 15 joins. We show that across different architectures and databases, a model trained with Flow-Loss improves the cost of plans (using the PostgreSQL cost model) and query runtimes despite having worse estimation accuracy than a model trained with Q-Error. When the test set queries closely match the training queries, both models improve performance significantly over PostgreSQL and are close to the optimal performance (using true cardinalities). However, the Q-Error trained model degrades significantly when evaluated on queries that are slightly different (e.g., similar but not identical query templates), while the Flow-Loss trained model generalizes better to such situations. For example, the Flow-Loss model achieves up to 1.5x better runtimes on unseen templates compared to the Q-Error model, despite leveraging the same model architecture and training data

arXiv.org e-Print Archive

DSpace@MIT

How Good Are Query Optimizers, Really?

Author: Boncz P.A. (Peter)
Gubichev A. (Andrey)
Kemper A. (Alfons)
Leis V. (Viktor)
Mirchev A. (Atanas)
Neumann T. (Thomas)
Publication venue
Publication date: 01/11/2015
Field of study

Finding a good join order is crucial for query performance. In this paper, we introduce the Join Order Benchmark (JOB) and experimentally revisi

CWI's Institutional Repository

Neo: A Learned Query Optimizer

Author: Alizadeh Mohammad
Kraska Tim
Mao Hongzi
Marcus Ryan
Negi Parimarjan
Papaemmanouil Olga
Tatbul Nesime
Zhang Chi
Publication venue: 'VLDB Endowment'
Publication date: 07/04/2019
Field of study

Query optimization is one of the most challenging problems in database systems. Despite the progress made over the past decades, query optimizers remain extremely complex components that require a great deal of hand-tuning for specific workloads and datasets. Motivated by this shortcoming and inspired by recent advances in applying machine learning to data management challenges, we introduce Neo (Neural Optimizer), a novel learning-based query optimizer that relies on deep neural networks to generate query executions plans. Neo bootstraps its query optimization model from existing optimizers and continues to learn from incoming queries, building upon its successes and learning from its failures. Furthermore, Neo naturally adapts to underlying data patterns and is robust to estimation errors. Experimental results demonstrate that Neo, even when bootstrapped from a simple optimizer like PostgreSQL, can learn a model that offers similar performance to state-of-the-art commercial optimizers, and in some cases even surpass them

arXiv.org e-Print Archive

DSpace@MIT

10381 Summary and Abstracts Collection -- Robust Query Processing

Author: Kuno Harumi Anne
Markl Volker
Sattler Kai-Uwe
Publication venue: Dagstuhl Seminar Proceedings. 10381 - Robust Query Processing
Publication date: 01/01/2011
Field of study

Dagstuhl seminar 10381 on robust query processing (held 19.09.10 - 24.09.10) brought together a diverse set of researchers and practitioners with a broad range of expertise for the purpose of fostering discussion and collaboration regarding causes, opportunities, and solutions for achieving robust query processing. The seminar strove to build a unified view across the loosely-coupled system components responsible for the various stages of database query processing. Participants were chosen for their experience with database query processing and, where possible, their prior work in academic research or in product development towards robustness in database query processing. In order to pave the way to motivate, measure, and protect future advances in robust query processing, seminar 10381 focused on developing tests for measuring the robustness of query processing. In these proceedings, we first review the seminar topics, goals, and results, then present abstracts or notes of some of the seminar break-out sessions. We also include, as an appendix, the robust query processing reading list that was collected and distributed to participants before the seminar began, as well as summaries of a few of those papers that were contributed by some participants

Dagstuhl Research Online Publication Server