Search CORE

11 research outputs found

Learned Cardinalities: Estimating Correlated Joins with Deep Learning

Author: Boncz Peter
Kemper Alfons
Kipf Andreas
Kipf Thomas
Leis Viktor
Radke Bernhard
Publication venue
Publication date: 18/12/2018
Field of study

We describe a new deep learning approach to cardinality estimation. MSCN is a multi-set convolutional network, tailored to representing relational query plans, that employs set semantics to capture query features and true cardinalities. MSCN builds on sampling-based estimation, addressing its weaknesses when no sampled tuples qualify a predicate, and in capturing join-crossing correlations. Our evaluation of MSCN using a real-world dataset shows that deep learning significantly enhances the quality of cardinality estimation, which is the core problem in query optimization.Comment: CIDR 2019. https://github.com/andreaskipf/learnedcardinalitie

arXiv.org e-Print Archive

CWI's Institutional Repository

Learned cardinalities: Estimating correlated joins with deep learning

Author: Boncz P.A. (Peter)
Kemper A. (Alfons)
Kipf A. (Andreas)
Kipf T. (Thomas)
Leis V. (Viktor)
Radke B. (Bernhard)
Publication venue
Publication date: 01/01/2019
Field of study

We describe a new deep learning approach to cardinality estimation. MSCN is a multi-set convolutional network, tailored to representing relational query plans, that employs set semantics to capture query features and true cardinalities. MSCN builds on sampling-based estimation, addressing its weaknesses when no sampled tuples qualify a predicate, and in capturing join-crossing correlations. Our evaluation of MSCN using a real-world dataset shows that deep learning signiicantly enhances the quality of cardinality estimation, which is the core problem in query optimization

CWI's Institutional Repository

Learned Cardinalities: Estimating Correlated Joins with Deep Learning

Author: Boncz P.A. (Peter)
Kemper A. (Alfons)
Kipf A. (Andreas)
Kipf T. (Thomas)
Leis V. (Viktor)
Radke B. (Bernhard)
Publication venue
Publication date: 18/12/2018
Field of study

CWI's Institutional Repository

An experimental study of learned cardinality estimation

Author: Wang Xiaoying
Publication venue
Publication date: 17/12/2020
Field of study

Cardinality estimation is a fundamental but long unresolved problem in query optimization. Recently, multiple papers from different research groups consistently report that learned models have the potential to replace existing cardinality estimators. In this thesis, we ask a forward-thinking question: Are we ready to deploy these learned cardinality models in production? Our study consists of three main parts. Firstly, we focus on the static environment (i.e., no data updates) and compare five new learned methods with eight traditional methods on four real-world datasets under a unified workload setting. The results show that learned models are indeed more accurate than traditional methods, but they often suffer from high training and inference costs. Secondly, we explore whether these learned models are ready for dynamic environments (i.e., frequent data updates). We find that they can- not catch up with fast data updates and return large errors for different reasons. For less frequent updates, they can perform better but there is no clear winner among themselves. Thirdly, we take a deeper look into learned models and explore when they may go wrong. Our results show that the performance of learned methods can be greatly affected by the changes in correlation, skewness, or domain size. More importantly, their behaviors are much harder to interpret and often unpredictable. Based on these findings, we identify two promising research directions (control the cost of learned models and make learned models trustworthy) and suggest a number of research opportunities. We hope that our study can guide researchers and practitioners to work together to eventually push learned cardinality estimators into real database systems

Simon Fraser University Institutional Repository

DeepDB: Learn from Data, not from Queries!

Author: Binnig Carsten
Hilprecht Benjamin
Kersting Kristian
Kulessa Moritz
Molina Alejandro
Schmidt Andreas
Publication venue
Publication date: 02/09/2019
Field of study

The typical approach for learned DBMS components is to capture the behavior by running a representative set of queries and use the observations to train a machine learning model. This workload-driven approach, however, has two major downsides. First, collecting the training data can be very expensive, since all queries need to be executed on potentially large databases. Second, training data has to be recollected when the workload and the data changes. To overcome these limitations, we take a different route: we propose to learn a pure data-driven model that can be used for different tasks such as query answering or cardinality estimation. This data-driven model also supports ad-hoc queries and updates of the data without the need of full retraining when the workload or data changes. Indeed, one may now expect that this comes at a price of lower accuracy since workload-driven models can make use of more information. However, this is not the case. The results of our empirical evaluation demonstrate that our data-driven approach not only provides better accuracy than state-of-the-art learned components but also generalizes better to unseen queries

arXiv.org e-Print Archive

TUbiblio

KITopen

A query processing system for very large spatial databases using a new map algebra

Author: Firouzabadi Seyed-Ali
Publication venue: 'Universite de Sherbrooke'
Publication date: 01/01/2002
Field of study

Dans cette thèse nous introduisons une approche de traitement de requêtes pour des bases de donnée spatiales. Nous expliquons aussi les concepts principaux que nous avons défini et développé: une algèbre spatiale et une approche à base de graphe utilisée dans l'optimisateur. L'algèbre spatiale est défini pour exprimer les requêtes et les règles de transformation pendant les différentes étapes de l'optimisation de requêtes. Nous avons essayé de définir l'algèbre la plus complète que possible pour couvrir une grande variété d'application. L'opérateur algébrique reçoit et produit seulement des carte. Les fonctions reçoivent des cartes et produisent des scalaires ou des objets. L'optimisateur reçoit la requête en expression algébrique et produit un QEP (Query Evaluation Plan) efficace dans deux étapes: génération de QEG (Query Evaluation Graph) et génération de QEP. Dans première étape un graphe (QEG) équivalent de l'expression algébrique est produit. Les règles de transformation sont utilisées pour transformer le graphe a un équivalent plus efficace. Dans deuxième étape un QEP est produit de QEG passé de l'étape précédente. Le QEP est un ensemble des opérations primitives consécutives qui produit les résultats finals (la réponse finale de la requête soumise au base de donnée). Nous avons implémenté l'optimisateur, un générateur de requête spatiale aléatoire, et une base de donnée simulée. La base de donnée spatiale simulée est un ensemble de fonctions pour simuler des opérations spatiales primitives. Les requêtes aléatoires sont soumis à l'optimisateur. Les QEPs générées sont soumis au simulateur de base de données spatiale. Les résultats expérimentaux sont utilisés pour discuter les performances et les caractéristiques de l'optimisateur.Abstract: In this thesis we introduce a query processing approach for spatial databases and explain the main concepts we defined and developed: a spatial algebra and a graph based approach used in the optimizer. The spatial algebra was defined to express queries and transformation rules during different steps of the query optimization. To cover a vast variety of potential applications, we tried to define the algebra as complete as possible. The algebra looks at the spatial data as maps of spatial objects. The algebraic operators act on the maps and result in new maps. Aggregate functions can act on maps and objects and produce objects or basic values (characters, numbers, etc.). The optimizer receives the query in algebraic expression and produces one efficient QEP (Query Evaluation Plan) through two main consecutive blocks: QEG (Query Evaluation Graph) generation and QEP generation. In QEG generation we construct a graph equivalent of the algebraic expression and then apply graph transformation rules to produce one efficient QEG. In QEP generation we receive the efficient QEG and do predicate ordering and approximation and then generate the efficient QEP. The QEP is a set of consecutive phases that must be executed in the specified order. Each phase consist of one or more primitive operations. All primitive operations that are in the same phase can be executed in parallel. We implemented the optimizer, a randomly spatial query generator and a simulated spatial database. The query generator produces random queries for the purpose of testing the optimizer. The simulated spatial database is a set of functions to simulate primitive spatial operations. They return the cost of the corresponding primitive operation according to input parameters. We put randomly generated queries to the optimizer, got the generated QEPs and put them to the spatial database simulator. We used the experimental results to discuss on the optimizer characteristics and performance. The optimizer was designed for databases with a very large number of spatial objects nevertheless most of the concepts we used can be applied to all spatial information systems."--Résumé abrégé par UMI

Savoirs UdeS