14 research outputs found
Rewriting Declarative Query Languages
Queries against databases are formulated in declarative languages. Examples are the relational query language SQL and XPath or XQuery for querying data stored in XML. Using a declarative query language, the querist does not need to know about or decide on anything about the actual strategy a system uses to answer the query. Instead, the system can freely choose among the algorithms it employs to answer a query. Predominantly, query processing in the relational context is accomplished using a relational algebra. To this end, the query is translated into a logical algebra. The algebra consists of logical operators which facilitate the application of various optimization techniques. For example, logical algebra expressions can be rewritten in order to yield more efficient expressions. In order to query XML data, XPath and XQuery have been developed. Both are declarative query languages and, hence, can benefit from powerful optimizations. For instance, they could be evaluated using an algebraic framework. However, in general, the existing approaches are not directly utilizable for XML query processing. This thesis has two goals. The first goal is to overcome the above-mentioned misfits of XML query processing, making it ready for industrial-strength settings. Specifically, we develop an algebraic framework that is designed for the efficient evaluation of XPath and XQuery. To this end, we define an order-aware logical algebra and a translation of XPath into this algebra. Furthermore, based on the resulting algebraic expressions, we present rewrites in order to speed up the execution of such queries. The second goal is to investigate rewriting techniques in the relational context. To this end, we present rewrites based on algebraic equivalences that unnest nested SQL queries with disjunctions. Specifically, we present equivalences for unnesting algebraic expressions with bypass operators to handle disjunctive linking and correlation. Our approach can be applied to quantified table subqueries as well as scalar subqueries. For all our results, we present experiments that demonstrate the effectiveness of the developed approaches
Execution strategies for SQL subqueries
Optimizing SQL subqueries has been an active area in database research and the database industry throughout the last decades. Pre-vious work has already identified some approaches to efficiently execute relational subqueries. For satisfactory performance, proper choice of subquery execution strategies becomes even more essen-tial today with the increase in decision support systems and auto-matically generated SQL, e.g., with ad-hoc reporting tools. This goes hand in hand with increasing query complexity and growing data volumes – which all pose challenges for an industrial-strength query optimizer. This current paper explores the basic building blocks that Microsoft SQL Server utilizes to optimize and execute relational subqueries. We start with indispensable prerequisites such as detection and removal of correlations for subqueries. We identify a full spectrum of fundamental subquery execution strategies such as forward and reverse lookup as well as set-based approaches, explain the different execution strategies for subqueries implemented in SQL Server, and relate them to the current state of the art. To the best of our knowl-edge, several strategies discussed in this paper have not been pub-lished before. An experimental evaluation complements the paper. It quantifies the performance characteristics of the different approaches and shows that indeed alternative execution strategies are needed in different circumstances, which make a cost-based query optimizer indispen-sable for adequate query performance
An Algebraic Approach to XQuery Optimization
As more data is stored in XML and more applications need to process this data, XML query optimization becomes performance critical. While optimization techniques for relational databases have been developed over the last thirty years, the optimization of XML queries poses new challenges. Query optimizers for XQuery, the standard query language for XML data, need to consider both document order and sequence order. Nevertheless, algebraic optimization proved powerful in query optimizers in relational and object oriented databases. Thus, this dissertation presents an algebraic approach to XQuery optimization. In this thesis, an algebra over sequences is presented that allows for a simple translation of XQuery into this algebra. The formal definitions of the operators in this algebra allow us to reason formally about algebraic optimizations. This thesis leverages the power of this formalism when unnesting nested XQuery expressions. In almost all cases unnesting nested queries in XQuery reduces query execution times from hours to seconds or milliseconds. Moreover, this dissertation presents three basic algebraic patterns of nested queries. For every basic pattern a decision tree is developed to select the most effective unnesting equivalence for a given query. Query unnesting extends the search space that can be considered during cost-based optimization of XQuery. As a result, substantially more efficient query execution plans may be detected. This thesis presents two more important cases where the number of plan alternatives leads to substantially shorter query execution times: join ordering and reordering location steps in path expressions. Our algebraic framework detects cases where document order or sequence order is destroyed. However, state-of-the-art techniques for order optimization in cost-based query optimizers have efficient mechanisms to repair order in these cases. The results obtained for query unnesting and cost-based optimization of XQuery underline the need for an algebraic approach to XQuery optimization for efficient XML query processing. Moreover, they are applicable to optimization in relational databases where order semantics are considered
Self Maintenance of Materialized XQuery Views via Query Containment and Re-Writing
In recent years XML, the eXtensible Markup Language has become the de-facto standard for publishing and exchanging information on the web and in enterprise data integration systems. Materialized views are often used in information integration systems to present a unified schema for efficient querying of distributed and possibly heterogenous data sources. On similar lines, ACE-XQ, an XQuery based semantic caching system shows the significant performance gains achieved by caching query results (as materialized views) and using these materialized views along with query containment techniques for answering future queries over distributed XML data sources. To keep data in these materialized views of ACE-XQ up-to-date, the view must be maintained i.e. whenever the base data changes, the corresponding cached data in the materialized view must also be updated. This thesis builds on the query containment ideas of ACE-XQ and proposes an efficient approach for self-maintenance of materialized views. Our experimental results illustrate the significant performance improvement achieved by this strategy over view re-computation for a variety of situations
Query translation and optimisation for complex value databases
This thesis considers the theory of database queries on the complex value data model
extended with external functions. In modern intelligent database systems, we expect
that query systems be able to handle a wide range of calculus formulas correctly and
efficiently. Accordingly, they will require general query translators and efficient optimisers.
Motivated by these concerns, this thesis undertakes a· comprehensive study of
query evaluation in the complex value model and investigates the following issues:
• identifying recursive sets of complex value formulas which define domain independent
queries;
• implementing complex value calculus queries with the incorporation of functions;
• solving the problem of how to process join operation in complex value databases;
and
• investigating some algebraic properties concerning nested relational operators.
The first part of this thesis extends some classical properties of the relational theory -
particularly those related to query safety - to the context of complex value databases
with fixed external functions and investigates the problem of how to implement calculus
queries. Two notions of syntactic criteria for queries which guarantee domain
independence, namely, embedded evaluable and embedded allowed, are generalised for
this data model. This thesis shows that all embedded-allowed calculus (or fix-point)
queries are external-function domain independent and continuous.
This thesis discusses the topic of "embedded allowed database programs" and proves
that embedded allowed stratified programs satisfying certain constraints are embedded
domain independent. It also develops an algorithm for translating embedded allowed
queries into equivalent algebraic expressions as a basis for evaluating safe queries in all
calculus-based query classes. The second part of this thesis considers the issue of query optimisation for nested
relational databases. Within a restricted set of nested schema trees, a join operator,
called P-join, is proposed. The P-join operator does not require as many restructuring
operators and combines the advantages of the extended natural join and recursive join
for efficient data access. A P-join algorithm which takes advantage of a decomposed
storage model and various join techniques available in the standard relational model
to reduce the cost of join operation in nested relational databases is also proposed.
Finally, this thesis investigates some algebraic properties of nested relational operators
which are useful for query optimisation in the nested relational model and outlines
a heuristic optimisation algorithm for nested relational expressions by adopting algebraic
transformation rules developed in this thesis and previous related work
Logical Foundations of Object-Oriented and Frame-Based Languages
We propose a novel logic, called Frame Logic (abbr., F-logic), that accounts in a clean, declarative fashion for most of the structural aspects of object-oriented and frame-based languages. These features include object identity, complex objects, inheritance, polymorphic types, methods, encapsulation, and others. In a sense, F-logic stands in the same relationship to the object-oriented paradigm as classical predicate calculus stands to relational programming. The syntax of F-logic is higher-order, which, among other things, allows the user to explore data and schema using the same declarative language. F-logic has a model-theoretic semantics and a sound and complete resolution-based proof procedure. This paper also discusses various aspects of programming in declarative object-oriented languages based on F-logic
Equivalence of Queries with Nested Aggregation
Query equivalence is a fundamental problem within database theory. The correctness of all forms of logical query rewriting—join minimization, view flattening, rewriting over materialized views, various semantic optimizations that exploit schema dependencies, federated query processing and other forms of data integration—requires proving that the final executed query is equivalent to the original user query. Hence, advances in the theory of query equivalence enable advances in query processing and optimization.
In this thesis we address the problem of deciding query equivalence between conjunctive SQL queries containing aggregation operators that may be nested. Our focus is on understanding the interaction between nested aggregation operators and the other parts of the query body, and so we model aggregation functions simply as abstract collection constructors. Hence, the precise language that we study is a conjunctive algebraic language that constructs complex objects from databases of flat relations. Using an encoding of complex objects as flat relations, we reduce the query equivalence problem for this algebraic language to deciding equivalence between relational encodings output by traditional conjunctive queries (not containing aggregation). This encoding-equivalence cleanly unifies and generalizes previous results for deciding equivalence of conjunctive queries evaluated under various processing semantics. As part of our study of aggregation operators that can construct empty sub-collections—so-called “scalar” aggregation—we consider query equivalence for conjunctive queries extended with a left outer join operator, a very practical class of queries for which the general equivalence problem has never before been analyzed. Although we do not completely solve the equivalence problem for queries with outer joins or with scalar aggregation, we do propose useful sufficient conditions that generalize previously known results for restricted classes of queries. Overall, this thesis offers new insight into the fundamental principles governing the behaviour of nested aggregation
Weiterentwicklung analytischer Datenbanksysteme
This thesis contributes to the state of the art in analytical database systems. First, we identify and explore extensions to better support analytics on event streams. Second, we propose a novel polygon index to enable efficient geospatial data processing in main memory. Third, we contribute a new deep learning approach to cardinality estimation, which is the core problem in cost-based query optimization.Diese Arbeit trägt zum aktuellen Forschungsstand von analytischen Datenbanksystemen bei. Wir identifizieren und explorieren Erweiterungen um Analysen auf Eventströmen besser zu unterstützen. Wir stellen eine neue Indexstruktur für Polygone vor, die eine effiziente Verarbeitung von Geodaten im Hauptspeicher ermöglicht. Zudem präsentieren wir einen neuen Ansatz für Kardinalitätsschätzungen mittels maschinellen Lernens
Programming Languages and Systems
This open access book constitutes the proceedings of the 30th European Symposium on Programming, ESOP 2021, which was held during March 27 until April 1, 2021, as part of the European Joint Conferences on Theory and Practice of Software, ETAPS 2021. The conference was planned to take place in Luxembourg and changed to an online format due to the COVID-19 pandemic. The 24 papers included in this volume were carefully reviewed and selected from 79 submissions. They deal with fundamental issues in the specification, design, analysis, and implementation of programming languages and systems
Computer science: the hardware software and heart of IT
1st edition, 201