12,744 research outputs found
Automatic optimization of array queries
Non-trivial scientific applications often involve complex computations on large multi-dimensional datasets. Using relational database technology for these datasets is cumbersome since expressing the computations in terms of relational queries is difficult and time-consuming. Moreover, query optimization strategies successful in classical relational domains may not suffice when applied to the multi-dimensional array domain. The RAM (Relational Array Mapping) system hides these issues by providing a transparent mapping between the scientific problem specification and the underlying database system. This paper focuses on the RAM query optimizer which is specifically tuned to exploit the characteristics of the array paradigm. We detail how an intermediate array-algebra and several equivalence rules are used to create efficient query plans and how, with minor extensions, the optimizer can automatically parallelize array operation
A Sampling Algebra for Aggregate Estimation
As of 2005, sampling has been incorporated in all major database systems.
While efficient sampling techniques are realizable, determining the accuracy of
an estimate obtained from the sample is still an unresolved problem. In this
paper, we present a theoretical framework that allows an elegant treatment of
the problem. We base our work on generalized uniform sampling (GUS), a class of
sampling methods that subsumes a wide variety of sampling techniques. We
introduce a key notion of equivalence that allows GUS sampling operators to
commute with selection and join, and derivation of confidence intervals. We
illustrate the theory through extensive examples and give indications on how to
use it to provide meaningful estimations in database systems
HoTTSQL: Proving Query Rewrites with Univalent SQL Semantics
Every database system contains a query optimizer that performs query
rewrites. Unfortunately, developing query optimizers remains a highly
challenging task. Part of the challenges comes from the intricacies and rich
features of query languages, which makes reasoning about rewrite rules
difficult. In this paper, we propose a machine-checkable denotational semantics
for SQL, the de facto language for relational database, for rigorously
validating rewrite rules. Unlike previously proposed semantics that are either
non-mechanized or only cover a small amount of SQL language features, our
semantics covers all major features of SQL, including bags, correlated
subqueries, aggregation, and indexes. Our mechanized semantics, called HoTTSQL,
is based on K-Relations and homotopy type theory, where we denote relations as
mathematical functions from tuples to univalent types. We have implemented
HoTTSQL in Coq, which takes only fewer than 300 lines of code and have proved a
wide range of SQL rewrite rules, including those from database research
literature (e.g., magic set rewrites) and real-world query optimizers (e.g.,
subquery elimination). Several of these rewrite rules have never been
previously proven correct. In addition, while query equivalence is generally
undecidable, we have implemented an automated decision procedure using HoTTSQL
for conjunctive queries: a well-studied decidable fragment of SQL that
encompasses many real-world queries
Logically automorphically equivalent knowledge bases
Knowledge bases theory provide an important example of the field where
applications of universal algebra and algebraic logic look very natural, and
their interaction with practical problems arising in computer science might be
very productive.
In this paper we study the equivalence problem for knowledge bases. Our
interest is to find out how the informational equivalence is related to the
logical description of knowledge. Studying various equivalences of knowledge
bases allows us to compare different knowledge bases. The main objective of
this paper is logically automorphically equivalent knowledge bases. As we will
see this notion gives us a good enough characterization of knowledge bases
Knowledge bases over algebraic models. Some notes about informational equivalence
The recent advances in knowledge base research and the growing importance of
effective knowledge management raised an important question of knowledge base
equivalence verification. This problem has not been stated earlier, at least in
a way that allows speaking about algorithms for verification of informational
equivalence, because the informal definition of knowledge bases makes formal
solution of this problem impossible. In this paper we provide an implementable
formal algorithm for knowledge base equivalence verification based on the
formal definition of knowledge base proposed by Plotkin B. and Plotkin T., and
study some important properties of automorphic equivalence of models. We also
describe the concept of equivalence and formulate the criterion for the
equivalence of knowledge bases defined over finite models. Further we define
multi-models and automorphic equivalence of models and multi-models, that is
generalization of automorphic equivalence of algebras.Comment: 22 page
Heuristic and Cost-based Optimization for Diverse Provenance Tasks
A well-established technique for capturing database provenance as annotations
on data is to instrument queries to propagate such annotations. However, even
sophisticated query optimizers often fail to produce efficient execution plans
for instrumented queries. We develop provenance-aware optimization techniques
to address this problem. Specifically, we study algebraic equivalences targeted
at instrumented queries and alternative ways of instrumenting queries for
provenance capture. Furthermore, we present an extensible heuristic and
cost-based optimization framework utilizing these optimizations. Our
experiments confirm that these optimizations are highly effective, improving
performance by several orders of magnitude for diverse provenance tasks.Comment: IEEE Transactions on Knowledge and Data Engineering (TKDE), 2018,
long version, 31 pages. arXiv admin note: substantial text overlap with
arXiv:1701.0551
Relational Algebra for In-Database Process Mining
The execution logs that are used for process mining in practice are often
obtained by querying an operational database and storing the result in a flat
file. Consequently, the data processing power of the database system cannot be
used anymore for this information, leading to constrained flexibility in the
definition of mining patterns and limited execution performance in mining large
logs. Enabling process mining directly on a database - instead of via
intermediate storage in a flat file - therefore provides additional flexibility
and efficiency. To help facilitate this ideal of in-database process mining,
this paper formally defines a database operator that extracts the 'directly
follows' relation from an operational database. This operator can both be used
to do in-database process mining and to flexibly evaluate process mining
related queries, such as: "which employee most frequently changes the 'amount'
attribute of a case from one task to the next". We define the operator using
the well-known relational algebra that forms the formal underpinning of
relational databases. We formally prove equivalence properties of the operator
that are useful for query optimization and present time-complexity properties
of the operator. By doing so this paper formally defines the necessary
relational algebraic elements of a 'directly follows' operator, which are
required for implementation of such an operator in a DBMS
Equivalence of SQL Queries in Presence of Embedded Dependencies
We consider the problem of finding equivalent minimal-size reformulations of
SQL queries in presence of embedded dependencies [1]. Our focus is on
select-project-join (SPJ) queries with equality comparisons, also known as safe
conjunctive (CQ) queries, possibly with grouping and aggregation. For SPJ
queries, the semantics of the SQL standard treat query answers as multisets
(a.k.a. bags), whereas the stored relations may be treated either as sets,
which is called bag-set semantics for query evaluation, or as bags, which is
called bag semantics. (Under set semantics, both query answers and stored
relations are treated as sets.)
In the context of the above Query-Reformulation Problem, we develop a
comprehensive framework for equivalence of CQ queries under bag and bag-set
semantics in presence of embedded dependencies, and make a number of conceptual
and technical contributions. Specifically, we develop equivalence tests for CQ
queries in presence of arbitrary sets of embedded dependencies under bag and
bag-set semantics, under the condition that chase [9] under set semantics
(set-chase) on the inputs terminates. We also present equivalence tests for
aggregate CQ queries in presence of embedded dependencies. We use our
equivalence tests to develop sound and complete (whenever set-chase on the
inputs terminates) algorithms for solving instances of the Query-Reformulation
Problem with CQ queries under each of bag and bag-set semantics, as well as for
instances of the problem with aggregate queries.Comment: Correction of the previous version as described in the last sentence
of the Abstrac
Querying the Guarded Fragment
Evaluating a Boolean conjunctive query Q against a guarded first-order theory
F is equivalent to checking whether "F and not Q" is unsatisfiable. This
problem is relevant to the areas of database theory and description logic.
Since Q may not be guarded, well known results about the decidability,
complexity, and finite-model property of the guarded fragment do not obviously
carry over to conjunctive query answering over guarded theories, and had been
left open in general. By investigating finite guarded bisimilar covers of
hypergraphs and relational structures, and by substantially generalising
Rosati's finite chase, we prove for guarded theories F and (unions of)
conjunctive queries Q that (i) Q is true in each model of F iff Q is true in
each finite model of F and (ii) determining whether F implies Q is
2EXPTIME-complete. We further show the following results: (iii) the existence
of polynomial-size conformal covers of arbitrary hypergraphs; (iv) a new proof
of the finite model property of the clique-guarded fragment; (v) the small
model property of the guarded fragment with optimal bounds; (vi) a
polynomial-time solution to the canonisation problem modulo guarded
bisimulation, which yields (vii) a capturing result for guarded bisimulation
invariant PTIME.Comment: This is an improved and extended version of the paper of the same
title presented at LICS 201
- …