189 research outputs found
Evaluating Datalog via Tree Automata and Cycluits
We investigate parameterizations of both database instances and queries that
make query evaluation fixed-parameter tractable in combined complexity. We show
that clique-frontier-guarded Datalog with stratified negation (CFG-Datalog)
enjoys bilinear-time evaluation on structures of bounded treewidth for programs
of bounded rule size. Such programs capture in particular conjunctive queries
with simplicial decompositions of bounded width, guarded negation fragment
queries of bounded CQ-rank, or two-way regular path queries. Our result is
shown by translating to alternating two-way automata, whose semantics is
defined via cyclic provenance circuits (cycluits) that can be tractably
evaluated.Comment: 56 pages, 63 references. Journal version of "Combined Tractability of
Query Evaluation via Tree Automata and Cycluits (Extended Version)" at
arXiv:1612.04203. Up to the stylesheet, page/environment numbering, and
possible minor publisher-induced changes, this is the exact content of the
journal paper that will appear in Theory of Computing Systems. Update wrt
version 1: latest reviewer feedbac
Classification of annotation semirings over containment of conjunctive queries
Funding: This work is supported under SOCIAM: The Theory and Practice of Social Machines, a project funded by the UK Engineering and Physical Sciences Research Council (EPSRC) under grant number EP/J017728/1. This work was also supported by FET-Open Project FoX, grant agreement 233599; EPSRC grants EP/F028288/1, G049165 and J015377; and the Laboratory for Foundations of Computer Science.We study the problem of query containment of conjunctive queries over annotated databases. Annotations are typically attached to tuples and represent metadata, such as probability, multiplicity, comments, or provenance. It is usually assumed that annotations are drawn from a commutative semiring. Such databases pose new challenges in query optimization, since many related fundamental tasks, such as query containment, have to be reconsidered in the presence of propagation of annotations. We axiomatize several classes of semirings for each of which containment of conjunctive queries is equivalent to existence of a particular type of homomorphism. For each of these types, we also specify all semirings for which existence of a corresponding homomorphism is a sufficient (or necessary) condition for the containment. We develop new decision procedures for containment for some semirings which are not in any of these classes. This generalizes and systematizes previous approaches.PostprintPeer reviewe
Parallel-Correctness and Transferability for Conjunctive Queries under Bag Semantics
Single-round multiway join algorithms first reshuffle data over many servers and then evaluate the query at hand in a parallel and communication-free way. A key question is whether a given distribution policy for the reshuffle is adequate for computing a given query. This property is referred to as parallel-correctness. Another key problem is to detect whether the data reshuffle step can be avoided when evaluating subsequent queries. The latter problem is referred to as transfer of parallel-correctness. This paper extends the study of parallel-correctness and transfer of parallel-correctness of conjunctive queries to incorporate bag semantics. We provide semantical characterizations for both problems, obtain complexity bounds and discuss the relationship with their set semantics counterparts. Finally, we revisit both problems under a modified distribution model that takes advantage of a linear order on compute nodes and obtain tight complexity bounds
Provenance Semirings
We show that relational algebra calculations for incomplete databases, probabilistic databases, bag semantics and why provenance are particular cases of the same general algorithms involving semirings. This further suggests a comprehensive provenance representation that uses semirings of polynomials. We extend these considerations to datalog and semirings of formal power series. We give algorithms for datalog provenance calculation as well as datalog evaluation for incomplete and probabilistic databases. Finally, we show that for some semirings containment of conjunctive queries is the same as for standard set semantics
When Can We Answer Queries Using Result-Bounded Data Interfaces?
We consider answering queries where the underlying data is available only
over limited interfaces which provide lookup access to the tuples matching a
given binding, but possibly restricting the number of output tuples returned.
Interfaces imposing such "result bounds" are common in accessing data via the
web. Given a query over a set of relations as well as some integrity
constraints that relate the queried relations to the data sources, we examine
the problem of deciding if the query is answerable over the interfaces; that
is, whether there exists a plan that returns all answers to the query, assuming
the source data satisfies the integrity constraints.
The first component of our analysis of answerability is a reduction to a
query containment problem with constraints. The second component is a set of
"schema simplification" theorems capturing limitations on how interfaces with
result bounds can be useful to obtain complete answers to queries. These
results also help to show decidability for the containment problem that
captures answerability, for many classes of constraints. The final component in
our analysis of answerability is a "linearization" method, showing that query
containment with certain guarded dependencies -- including those that emerge
from answerability problems -- can be reduced to query containment for a
well-behaved class of linear dependencies. Putting these components together,
we get a detailed picture of how to check answerability over result-bounded
services.Comment: 45 pages, 2 tables, 43 references. Complete version with proofs of
the PODS'18 paper. The main text of this paper is almost identical to the
PODS'18 except that we have fixed some small mistakes. Relative to the
earlier arXiv version, many errors were corrected, and some terminology has
change
Rewriting Complex Queries from Cloud to Fog under Capability Constraints to Protect the Users' Privacy
In this paper we show how existing query rewriting and query containment techniques can be used to achieve an efficient and privacy-aware processing of queries. To achieve this, the whole network structure, from data producing sensors up to cloud computers, is utilized to create a database machine consisting of billions of devices from the Internet of Things. Based on previous research in the field of database theory, especially query rewriting, we present a concept to split a query into fragment and remainder queries. Fragment queries can operate on resource limited devices to filter and preaggregate data. Remainder queries take these data and execute the last, complex part of the original queries on more powerful devices. As a result, less data is processed and forwarded in the network and the privacy principle of data minimization is accomplished
- …