Search CORE

71 research outputs found

Nested Queries and Quantifiers in an Ordered Context

Author: Helmer Sven
May Norman
Moerkotte Guido
Publication venue
Publication date: 01/01/2003
Field of study

We present algebraic equivalences that allow to unnest nested algebraic expressions for order-preserving algebraic operators. We illustrate how these equivalences can be applied successfully to unnest nested queries given in the XQuery language. Measurements illustrate the performance gains possible by our approach

CiteSeerX

Crossref

MAnnheim DOCument Server

An Algebraic Approach to XQuery Optimization

Author: May Norman
Publication venue: Universität Mannheim
Publication date: 01/01/2007
Field of study

As more data is stored in XML and more applications need to process this data, XML query optimization becomes performance critical. While optimization techniques for relational databases have been developed over the last thirty years, the optimization of XML queries poses new challenges. Query optimizers for XQuery, the standard query language for XML data, need to consider both document order and sequence order. Nevertheless, algebraic optimization proved powerful in query optimizers in relational and object oriented databases. Thus, this dissertation presents an algebraic approach to XQuery optimization. In this thesis, an algebra over sequences is presented that allows for a simple translation of XQuery into this algebra. The formal definitions of the operators in this algebra allow us to reason formally about algebraic optimizations. This thesis leverages the power of this formalism when unnesting nested XQuery expressions. In almost all cases unnesting nested queries in XQuery reduces query execution times from hours to seconds or milliseconds. Moreover, this dissertation presents three basic algebraic patterns of nested queries. For every basic pattern a decision tree is developed to select the most effective unnesting equivalence for a given query. Query unnesting extends the search space that can be considered during cost-based optimization of XQuery. As a result, substantially more efficient query execution plans may be detected. This thesis presents two more important cases where the number of plan alternatives leads to substantially shorter query execution times: join ordering and reordering location steps in path expressions. Our algebraic framework detects cases where document order or sequence order is destroyed. However, state-of-the-art techniques for order optimization in cost-based query optimizers have efficient mechanisms to repair order in these cases. The results obtained for query unnesting and cost-based optimization of XQuery underline the need for an algebraic approach to XQuery optimization for efficient XML query processing. Moreover, they are applicable to optimization in relational databases where order semantics are considered

MAnnheim DOCument Server

Let a Single FLWOR Bloom

Author: Brantner Matthias
Kanne Carl-Christian
Moerkotte Guido
Publication venue
Publication date: 01/01/2007
Field of study

To globally optimize execution plans for XQuery expressions, a plan generator must generate and compare plan alternatives. In proven compiler architectures, the unit of plan generation is the query block. Fewer query blocks mean a larger search space for the plan generator and lead to a generally higher quality of the execution plans. The goal of this paper is to provide a toolkit for developers of XQuery evaluators to transform XQuery expressions into expressions with as few query blocks as possible. Our toolkit takes the form of rewrite rules merging the inner and outer FLWOR expressions into single FLWORs. We focus on previously unpublished rewrite rules and on inner FLWORs occurring in the For, Let, and Return clauses in the outer FLWOR

MAnnheim DOCument Server

Rewriting Declarative Query Languages

Author: Brantner Matthias
Publication venue: Universität Mannheim
Publication date: 01/01/2007
Field of study

Queries against databases are formulated in declarative languages. Examples are the relational query language SQL and XPath or XQuery for querying data stored in XML. Using a declarative query language, the querist does not need to know about or decide on anything about the actual strategy a system uses to answer the query. Instead, the system can freely choose among the algorithms it employs to answer a query. Predominantly, query processing in the relational context is accomplished using a relational algebra. To this end, the query is translated into a logical algebra. The algebra consists of logical operators which facilitate the application of various optimization techniques. For example, logical algebra expressions can be rewritten in order to yield more efficient expressions. In order to query XML data, XPath and XQuery have been developed. Both are declarative query languages and, hence, can benefit from powerful optimizations. For instance, they could be evaluated using an algebraic framework. However, in general, the existing approaches are not directly utilizable for XML query processing. This thesis has two goals. The first goal is to overcome the above-mentioned misfits of XML query processing, making it ready for industrial-strength settings. Specifically, we develop an algebraic framework that is designed for the efficient evaluation of XPath and XQuery. To this end, we define an order-aware logical algebra and a translation of XPath into this algebra. Furthermore, based on the resulting algebraic expressions, we present rewrites in order to speed up the execution of such queries. The second goal is to investigate rewriting techniques in the relational context. To this end, we present rewrites based on algebraic equivalences that unnest nested SQL queries with disjunctions. Specifically, we present equivalences for unnesting algebraic expressions with bypass operators to handle disjunctive linking and correlation. Our approach can be applied to quantified table subqueries as well as scalar subqueries. For all our results, we present experiments that demonstrate the effectiveness of the developed approaches

MAnnheim DOCument Server

Scalable Querying of Nested Data

Author: Benedikt Michael
Nikolic Milos
Shaikhha Amir
Smith Jaclyn
Publication venue
Publication date: 01/01/2020
Field of study

While large-scale distributed data processing platforms have become an attractive target for query processing, these systems are problematic for applications that deal with nested collections. Programmers are forced either to perform non-trivial translations of collection programs or to employ automated flattening procedures, both of which lead to performance problems. These challenges only worsen for nested collections with skewed cardinalities, where both handcrafted rewriting and automated flattening are unable to enforce load balancing across partitions. In this work, we propose a framework that translates a program manipulating nested collections into a set of semantically equivalent shredded queries that can be efficiently evaluated. The framework employs a combination of query compilation techniques, an efficient data representation for nested collections, and automated skew-handling. We provide an extensive experimental evaluation, demonstrating significant improvements provided by the framework in diverse scenarios for nested collection programs

arXiv.org e-Print Archive

Edinburgh Research Explorer

Oxford University Research Archive

A Pattern Calculus for Rule Languages: Expressiveness, Compilation, and Mechanization

Author: Hirzel Martin
Shinnar Avraham
Publication venue: LIPIcs - Leibniz International Proceedings in Informatics. 29th European Conference on Object-Oriented Programming (ECOOP 2015)
Publication date: 01/01/2015
Field of study

This paper introduces a core calculus for pattern-matching in production rule languages: the Calculus for Aggregating Matching Patterns (CAMP). CAMP is expressive enough to capture modern rule languages such as JRules, including extensions for aggregation. We show how CAMP can be compiled into a nested-relational algebra (NRA), with only minimal extension. This paves the way for applying relational techniques to running rules over large stores. Furthermore, we show that NRA can also be compiled back to CAMP, using named nested-relational calculus (NNRC) as an intermediate step. We mechanize proofs of correctness, program size preservation, and type preservation of the translations using modern theorem-proving techniques. A corollary of the type preservation is that polymorphic type inference for both CAMP and NRA is NP-complete. CAMP and its correspondence to NRA provide the foundations for efficient implementations of rules languages using databases technologies

Dagstuhl Research Online Publication Server

A Pattern Calculus for Rule Languages: Expressiveness, Compilation, and Mechanization (Artifact)

Author: Hirzel Martin
Shinnar Avraham
Publication venue: DARTS - Dagstuhl Artifacts Series. DARTS, Volume 1, Issue 1
Publication date: 01/01/2015
Field of study

This artifact contains the accompanying code for the ECOOP 2015 paper: "A Pattern Calculus for Rule Languages: Expressiveness, Compilation, and Mechanization". It contains source files for a full mechanization of the three languages presented in the paper: CAMP (Calculus for Aggregating Matching Patterns), NRA (Nested Relational Algebra) and NNRC (Named Nested Relational Calculus). Translations between all three languages and their attendant proofs of correctness are included. Additionally, a mechanization of a type system for the main languages is provided, along with bidirectional proofs of type preservation and proofs of the time complexity of the various compilers

Dagstuhl Research Online Publication Server

Dynamic programming strikes back

Author: Guido Moerkotte
Thomas Neumann
Publication venue
Publication date: 01/01/2008
Field of study

Two highly efficient algorithms are known for optimally ordering joins while avoiding cross products: DPccp, which is based on dynamic programming, and Top-Down Partition Search, based on memoization. Both have two severe limitations: They handle only (1) simple (binary) join predicates and (2) inner joins. However, real queries may contain complex join predicates, involving more than two relations, and outer joins as well as other non-inner joins. Taking the most efficient known join-ordering algorithm, DPccp, as a starting point, we first develop a new algorithm, DPhyp, which is capable to handle complex join predicates efficiently. We do so by modeling the query graph as a (variant of a) hypergraph and then reason about its connected subgraphs. Then, we present a technique to exploit this capability to efficiently handle the widest class of non-inner joins dealt with so far. Our experimental results show that this reformulation of non-inner joins as complex predicates can improve optimization time by orders of magnitude, compared to known algorithms dealing with complex join predicates and non-inner joins. Once again, this gives dynamic programming a distinct advantage over current memoization techniques

CiteSeerX

Crossref

MAnnheim DOCument Server

MPG.PuRe

XQuery containment in presence of variable binding dependencies

Author
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/01/2005
Field of study

Crossref