12,744 research outputs found

    Automatic optimization of array queries

    Get PDF
    Non-trivial scientific applications often involve complex computations on large multi-dimensional datasets. Using relational database technology for these datasets is cumbersome since expressing the computations in terms of relational queries is difficult and time-consuming. Moreover, query optimization strategies successful in classical relational domains may not suffice when applied to the multi-dimensional array domain. The RAM (Relational Array Mapping) system hides these issues by providing a transparent mapping between the scientific problem specification and the underlying database system. This paper focuses on the RAM query optimizer which is specifically tuned to exploit the characteristics of the array paradigm. We detail how an intermediate array-algebra and several equivalence rules are used to create efficient query plans and how, with minor extensions, the optimizer can automatically parallelize array operation

    A Sampling Algebra for Aggregate Estimation

    Full text link
    As of 2005, sampling has been incorporated in all major database systems. While efficient sampling techniques are realizable, determining the accuracy of an estimate obtained from the sample is still an unresolved problem. In this paper, we present a theoretical framework that allows an elegant treatment of the problem. We base our work on generalized uniform sampling (GUS), a class of sampling methods that subsumes a wide variety of sampling techniques. We introduce a key notion of equivalence that allows GUS sampling operators to commute with selection and join, and derivation of confidence intervals. We illustrate the theory through extensive examples and give indications on how to use it to provide meaningful estimations in database systems

    HoTTSQL: Proving Query Rewrites with Univalent SQL Semantics

    Full text link
    Every database system contains a query optimizer that performs query rewrites. Unfortunately, developing query optimizers remains a highly challenging task. Part of the challenges comes from the intricacies and rich features of query languages, which makes reasoning about rewrite rules difficult. In this paper, we propose a machine-checkable denotational semantics for SQL, the de facto language for relational database, for rigorously validating rewrite rules. Unlike previously proposed semantics that are either non-mechanized or only cover a small amount of SQL language features, our semantics covers all major features of SQL, including bags, correlated subqueries, aggregation, and indexes. Our mechanized semantics, called HoTTSQL, is based on K-Relations and homotopy type theory, where we denote relations as mathematical functions from tuples to univalent types. We have implemented HoTTSQL in Coq, which takes only fewer than 300 lines of code and have proved a wide range of SQL rewrite rules, including those from database research literature (e.g., magic set rewrites) and real-world query optimizers (e.g., subquery elimination). Several of these rewrite rules have never been previously proven correct. In addition, while query equivalence is generally undecidable, we have implemented an automated decision procedure using HoTTSQL for conjunctive queries: a well-studied decidable fragment of SQL that encompasses many real-world queries

    Logically automorphically equivalent knowledge bases

    Full text link
    Knowledge bases theory provide an important example of the field where applications of universal algebra and algebraic logic look very natural, and their interaction with practical problems arising in computer science might be very productive. In this paper we study the equivalence problem for knowledge bases. Our interest is to find out how the informational equivalence is related to the logical description of knowledge. Studying various equivalences of knowledge bases allows us to compare different knowledge bases. The main objective of this paper is logically automorphically equivalent knowledge bases. As we will see this notion gives us a good enough characterization of knowledge bases

    Knowledge bases over algebraic models. Some notes about informational equivalence

    Full text link
    The recent advances in knowledge base research and the growing importance of effective knowledge management raised an important question of knowledge base equivalence verification. This problem has not been stated earlier, at least in a way that allows speaking about algorithms for verification of informational equivalence, because the informal definition of knowledge bases makes formal solution of this problem impossible. In this paper we provide an implementable formal algorithm for knowledge base equivalence verification based on the formal definition of knowledge base proposed by Plotkin B. and Plotkin T., and study some important properties of automorphic equivalence of models. We also describe the concept of equivalence and formulate the criterion for the equivalence of knowledge bases defined over finite models. Further we define multi-models and automorphic equivalence of models and multi-models, that is generalization of automorphic equivalence of algebras.Comment: 22 page

    Heuristic and Cost-based Optimization for Diverse Provenance Tasks

    Full text link
    A well-established technique for capturing database provenance as annotations on data is to instrument queries to propagate such annotations. However, even sophisticated query optimizers often fail to produce efficient execution plans for instrumented queries. We develop provenance-aware optimization techniques to address this problem. Specifically, we study algebraic equivalences targeted at instrumented queries and alternative ways of instrumenting queries for provenance capture. Furthermore, we present an extensible heuristic and cost-based optimization framework utilizing these optimizations. Our experiments confirm that these optimizations are highly effective, improving performance by several orders of magnitude for diverse provenance tasks.Comment: IEEE Transactions on Knowledge and Data Engineering (TKDE), 2018, long version, 31 pages. arXiv admin note: substantial text overlap with arXiv:1701.0551

    Relational Algebra for In-Database Process Mining

    Get PDF
    The execution logs that are used for process mining in practice are often obtained by querying an operational database and storing the result in a flat file. Consequently, the data processing power of the database system cannot be used anymore for this information, leading to constrained flexibility in the definition of mining patterns and limited execution performance in mining large logs. Enabling process mining directly on a database - instead of via intermediate storage in a flat file - therefore provides additional flexibility and efficiency. To help facilitate this ideal of in-database process mining, this paper formally defines a database operator that extracts the 'directly follows' relation from an operational database. This operator can both be used to do in-database process mining and to flexibly evaluate process mining related queries, such as: "which employee most frequently changes the 'amount' attribute of a case from one task to the next". We define the operator using the well-known relational algebra that forms the formal underpinning of relational databases. We formally prove equivalence properties of the operator that are useful for query optimization and present time-complexity properties of the operator. By doing so this paper formally defines the necessary relational algebraic elements of a 'directly follows' operator, which are required for implementation of such an operator in a DBMS

    Equivalence of SQL Queries in Presence of Embedded Dependencies

    Full text link
    We consider the problem of finding equivalent minimal-size reformulations of SQL queries in presence of embedded dependencies [1]. Our focus is on select-project-join (SPJ) queries with equality comparisons, also known as safe conjunctive (CQ) queries, possibly with grouping and aggregation. For SPJ queries, the semantics of the SQL standard treat query answers as multisets (a.k.a. bags), whereas the stored relations may be treated either as sets, which is called bag-set semantics for query evaluation, or as bags, which is called bag semantics. (Under set semantics, both query answers and stored relations are treated as sets.) In the context of the above Query-Reformulation Problem, we develop a comprehensive framework for equivalence of CQ queries under bag and bag-set semantics in presence of embedded dependencies, and make a number of conceptual and technical contributions. Specifically, we develop equivalence tests for CQ queries in presence of arbitrary sets of embedded dependencies under bag and bag-set semantics, under the condition that chase [9] under set semantics (set-chase) on the inputs terminates. We also present equivalence tests for aggregate CQ queries in presence of embedded dependencies. We use our equivalence tests to develop sound and complete (whenever set-chase on the inputs terminates) algorithms for solving instances of the Query-Reformulation Problem with CQ queries under each of bag and bag-set semantics, as well as for instances of the problem with aggregate queries.Comment: Correction of the previous version as described in the last sentence of the Abstrac

    Querying the Guarded Fragment

    Full text link
    Evaluating a Boolean conjunctive query Q against a guarded first-order theory F is equivalent to checking whether "F and not Q" is unsatisfiable. This problem is relevant to the areas of database theory and description logic. Since Q may not be guarded, well known results about the decidability, complexity, and finite-model property of the guarded fragment do not obviously carry over to conjunctive query answering over guarded theories, and had been left open in general. By investigating finite guarded bisimilar covers of hypergraphs and relational structures, and by substantially generalising Rosati's finite chase, we prove for guarded theories F and (unions of) conjunctive queries Q that (i) Q is true in each model of F iff Q is true in each finite model of F and (ii) determining whether F implies Q is 2EXPTIME-complete. We further show the following results: (iii) the existence of polynomial-size conformal covers of arbitrary hypergraphs; (iv) a new proof of the finite model property of the clique-guarded fragment; (v) the small model property of the guarded fragment with optimal bounds; (vi) a polynomial-time solution to the canonisation problem modulo guarded bisimulation, which yields (vii) a capturing result for guarded bisimulation invariant PTIME.Comment: This is an improved and extended version of the paper of the same title presented at LICS 201
    corecore