5 research outputs found
Dynamic programming strikes back
Two highly efficient algorithms are known for optimally ordering joins while avoiding cross products: DPccp, which is based on dynamic programming, and Top-Down Partition Search, based on memoization. Both have two severe limitations: They handle only (1) simple (binary) join predicates and (2) inner joins. However, real queries may contain complex join predicates, involving more than two relations, and outer joins as well as other non-inner joins. Taking the most efficient known join-ordering algorithm, DPccp, as a starting point, we first develop a new algorithm, DPhyp, which is capable to handle complex join predicates efficiently. We do so by modeling the query graph as a (variant of a) hypergraph and then reason about its connected subgraphs. Then, we present a technique to exploit this capability to efficiently handle the widest class of non-inner joins dealt with so far. Our experimental results show that this reformulation of non-inner joins as complex predicates can improve optimization time by orders of magnitude, compared to known algorithms dealing with complex join predicates and non-inner joins. Once again, this gives dynamic programming a distinct advantage over current memoization techniques
Left Bit Right: For SPARQL Join Queries with OPTIONAL Patterns (Left-outer-joins)
SPARQL basic graph pattern (BGP) (a.k.a. SQL inner-join) query optimization
is a well researched area. However, optimization of OPTIONAL pattern queries
(a.k.a. SQL left-outer-joins) poses additional challenges, due to the
restrictions on the \textit{reordering} of left-outer-joins. The occurrence of
such queries tends to be as high as 50% of the total queries (e.g., DBPedia
query logs).
In this paper, we present \textit{Left Bit Right} (LBR), a technique for
\textit{well-designed} nested BGP and OPTIONAL pattern queries. Through LBR, we
propose a novel method to represent such queries using a graph of
\textit{supernodes}, which is used to aggressively prune the RDF triples, with
the help of compressed indexes. We also propose novel optimization strategies
-- first of a kind, to the best of our knowledge -- that combine together the
characteristics of \textit{acyclicity} of queries, \textit{minimality}, and
\textit{nullification}, \textit{best-match} operators. In this paper, we focus
on OPTIONAL patterns without UNIONs or FILTERs, but we also show how UNIONs and
FILTERs can be handled with our technique using a \textit{query rewrite}. Our
evaluation on RDF graphs of up to and over one billion triples, on a commodity
laptop with 8 GB memory, shows that LBR can process \textit{well-designed}
low-selectivity complex queries up to 11 times faster compared to the
state-of-the-art RDF column-stores as Virtuoso and MonetDB, and for highly
selective queries, LBR is at par with them.Comment: SIGMOD 201
Equivalence of Queries with Nested Aggregation
Query equivalence is a fundamental problem within database theory. The correctness of all forms of logical query rewriting—join minimization, view flattening, rewriting over materialized views, various semantic optimizations that exploit schema dependencies, federated query processing and other forms of data integration—requires proving that the final executed query is equivalent to the original user query. Hence, advances in the theory of query equivalence enable advances in query processing and optimization.
In this thesis we address the problem of deciding query equivalence between conjunctive SQL queries containing aggregation operators that may be nested. Our focus is on understanding the interaction between nested aggregation operators and the other parts of the query body, and so we model aggregation functions simply as abstract collection constructors. Hence, the precise language that we study is a conjunctive algebraic language that constructs complex objects from databases of flat relations. Using an encoding of complex objects as flat relations, we reduce the query equivalence problem for this algebraic language to deciding equivalence between relational encodings output by traditional conjunctive queries (not containing aggregation). This encoding-equivalence cleanly unifies and generalizes previous results for deciding equivalence of conjunctive queries evaluated under various processing semantics. As part of our study of aggregation operators that can construct empty sub-collections—so-called “scalar” aggregation—we consider query equivalence for conjunctive queries extended with a left outer join operator, a very practical class of queries for which the general equivalence problem has never before been analyzed. Although we do not completely solve the equivalence problem for queries with outer joins or with scalar aggregation, we do propose useful sufficient conditions that generalize previously known results for restricted classes of queries. Overall, this thesis offers new insight into the fundamental principles governing the behaviour of nested aggregation