Search CORE

15 research outputs found

Size and Treewidth Bounds for Conjunctive Queries

Author: Georg Gottlob
Gregory J. Valiant
Stephanie Tien Lee
Publication venue
Publication date: 01/01/2009
Field of study

This paper provides new worst-case bounds for the size and treewith of the result Q(D) of a conjunctive query Q to a database D. We derive bounds for the result size |Q(D) | in terms of structural properties of Q, both in the absence and in the presence of keys and functional dependencies. These bounds are based on a novel “coloring ” of the query variables that associates a coloring number C(Q) to each query Q. Using this coloring number, we derive tight bounds for the size of Q(D) in case (i) no functional dependencies or keys are specified, and (ii) simple (one-attribute) keys are given. These results generalize recent size-bounds for join queries obtained by Atserias, Grohe, and Marx (FOCS 2008). An extension of our coloring technique also gives a lower bound for |Q(D) | in the general setting of a query with arbitrary functional dependencies. Our new coloring scheme also allows us to precisely characterize (both in the absence of keys and with simple keys) the treewidth-preserving queries— the queries for which the output treewidth is bounded by a function of the input treewidth. Finally we characterize the queries that preserve the sparsity of the input in the general setting with arbitrary functional dependencies

CiteSeerX

Crossref

Oxford University Research Archive

Factorised Representations of Query Results

Author: Olteanu Dan
Zavodny Jakub
Publication venue
Publication date: 05/04/2011
Field of study

Query tractability has been traditionally defined as a function of input database and query sizes, or of both input and output sizes, where the query result is represented as a bag of tuples. In this report, we introduce a framework that allows to investigate tractability beyond this setting. The key insight is that, although the cardinality of a query result can be exponential, its structure can be very regular and thus factorisable into a nested representation whose size is only polynomial in the size of both the input database and query. For a given query result, there may be several equivalent representations, and we quantify the regularity of the result by its readability, which is the minimum over all its representations of the maximum number of occurrences of any tuple in that representation. We give a characterisation of select-project-join queries based on the bounds on readability of their results for any input database. We complement it with an algorithm that can find asymptotically optimal upper bounds and corresponding factorised representations.Comment: 44 pages, 13 figure

arXiv.org e-Print Archive

CiteSeerX

Crossref

Oxford University Research Archive

Join Cardinality Estimation Graphs: Analyzing Pessimistic and Optimistic Estimators Through a Common Lens

Author: Chen Jeremy Yujui
Publication venue: 'University of Waterloo'
Publication date: 25/07/2020
Field of study

Join cardinality estimation is a fundamental problem that is solved in the query optimizers of database management systems when generating efficient query plans. This problem arises both in systems that manage relational data as well those that manage graph-structured data where systems need to estimate the cardinalities of subgraphs in their input graphs. We focus on graph-structured data in this thesis. A popular class of join cardinality estimators uses statistics about sizes of small size queries to make estimates for larger queries. Statistics-based estimators can be broadly divided into two groups: (i) optimistic estimators that use statistics in formulas that make degree regularity and conditional independence assumptions; and (ii) the recent pessimistic estimators that estimate the sizes of queries using a set of upper bounds derived from linear programs, such as the AGM bound, or tighter bounds, such as the MOLP bound that are based on information theoretic bounds. In this thesis, we introduce a new framework that we call cardinality estimation graph (CEG) that can represent the estimates of both optimistic and pessimistic estimators. We observe that there is generally more than one way to generate optimistic estimates for a query, and the choice has either been ad-hoc or unspecified in previous work. We empirically show that choosing the largest candidate yields much higher accuracy than pessimistic estimators across different datasets and query workloads, and it is an effective heuristic to combat underestimations, which optimistic estimators are known to suffer from. To further improve the accuracy, we demonstrate how hash partitioning, an optimization technique designed to improve pessimistic estimators' accuracy, can be applied to optimistic estimators, and we evaluate the effectiveness. CEGs can also be used to obtain insights of pessimistic estimators. We show MOLP estimator is at least as tight as the pessimistic estimator and are identical on acyclic queries over binary relations, and the MOLP CEG offers an intuitive combinatorial proof that the MOLP bound is tighter than the DBPLP bound

University of Waterloo's Institutional Repository

Optimal Algorithms for Ranked Enumeration of Answers to Full Conjunctive Queries

Author: Ajwani Deepak
Gatterbauer Wolfgang
Riedewald Mirek
Tziavelis Nikolaos
Yang Xiaofeng
Publication venue
Publication date: 11/09/2020
Field of study

We study ranked enumeration of join-query results according to very general orders defined by selective dioids. Our main contribution is a framework for ranked enumeration over a class of dynamic programming problems that generalizes seemingly different problems that had been studied in isolation. To this end, we extend classic algorithms that find the k-shortest paths in a weighted graph. For full conjunctive queries, including cyclic ones, our approach is optimal in terms of the time to return the top result and the delay between results. These optimality properties are derived for the widely used notion of data complexity, which treats query size as a constant. By performing a careful cost analysis, we are able to uncover a previously unknown tradeoff between two incomparable enumeration approaches: one has lower complexity when the number of returned results is small, the other when the number is very large. We theoretically and empirically demonstrate the superiority of our techniques over batch algorithms, which produce the full result and then sort it. Our technique is not only faster for returning the first few results, but on some inputs beats the batch algorithm even when all results are produced.Comment: 50 pages, 19 figure

arXiv.org e-Print Archive

Research Repository UCD