3,711 research outputs found
An Ordered Bag Semantics for SQL
Semantic query optimization is an important issue in many contexts of databases including information integration, view maintenance and data warehousing and can substantially improve performance, especially in today's database systems which contain gigabytes of data. A crucial issue in semantic query optimization is query containment. Several papers have dealt with the problem of conjunctive query containment. In particular, some of the literature admits SQL like query languages with aggregate operations such as sum/count. Moreover, since real SQL requires a richer semantics than set semantics, there has been work on bag-semantics for SQL, essentially by introducing an interpreted column. One important technique for reasoning about query containment in the context of bag semantics is to translate the queries to alternatives using aggregate functions and assuming set semantics.
Furthermore, in SQL, order by is the operator by which the results are sorted based on certain attributes and, clearly, ordering is an important issue in query optimization. As such, there has been work done in support of ordering based on the application of the domain. However, a final step is required in order to introduce a rich semantics in support.
In this work, we integrate set and bag semantics to be able to reason about real SQL queries. We demonstrate an ordered bag semantics for SQL using a relational algebra with aggregates. We define a set algebra with various expressions of interest, then define syntax and semantics for bag algebra, and finally extend these definitions to ordered bags. This is done by adding a pair of additional interpreted columns to computed relations in which the first column is used in the standard fashion to capture duplicate tuples in query results, and the second adds an ordering priority to the output. We show that the relational algebra with aggregates can be used to compute these interpreted columns with sufficient flexibility to work as a semantics for standard SQL queries, which are allowed to include order by and duplicate preserving select clauses. The reduction of a workable ordered bag semantics for SQL to the relational algebra with aggregates - as we have developed it - can enable existing query containment theory to be applied in practical query containment
Mapping-equivalence and oid-equivalence of single-function object-creating conjunctive queries
Conjunctive database queries have been extended with a mechanism for object
creation to capture important applications such as data exchange, data
integration, and ontology-based data access. Object creation generates new
object identifiers in the result, that do not belong to the set of constants in
the source database. The new object identifiers can be also seen as Skolem
terms. Hence, object-creating conjunctive queries can also be regarded as
restricted second-order tuple-generating dependencies (SO tgds), considered in
the data exchange literature.
In this paper, we focus on the class of single-function object-creating
conjunctive queries, or sifo CQs for short. We give a new characterization for
oid-equivalence of sifo CQs that is simpler than the one given by Hull and
Yoshikawa and places the problem in the complexity class NP. Our
characterization is based on Cohen's equivalence notions for conjunctive
queries with multiplicities. We also solve the logical entailment problem for
sifo CQs, showing that also this problem belongs to NP. Results by Pichler et
al. have shown that logical equivalence for more general classes of SO tgds is
either undecidable or decidable with as yet unknown complexity upper bounds.Comment: This revised version has been accepted on 11 January 2016 for
publication in The VLDB Journa
Evaluating Datalog via Tree Automata and Cycluits
We investigate parameterizations of both database instances and queries that
make query evaluation fixed-parameter tractable in combined complexity. We show
that clique-frontier-guarded Datalog with stratified negation (CFG-Datalog)
enjoys bilinear-time evaluation on structures of bounded treewidth for programs
of bounded rule size. Such programs capture in particular conjunctive queries
with simplicial decompositions of bounded width, guarded negation fragment
queries of bounded CQ-rank, or two-way regular path queries. Our result is
shown by translating to alternating two-way automata, whose semantics is
defined via cyclic provenance circuits (cycluits) that can be tractably
evaluated.Comment: 56 pages, 63 references. Journal version of "Combined Tractability of
Query Evaluation via Tree Automata and Cycluits (Extended Version)" at
arXiv:1612.04203. Up to the stylesheet, page/environment numbering, and
possible minor publisher-induced changes, this is the exact content of the
journal paper that will appear in Theory of Computing Systems. Update wrt
version 1: latest reviewer feedbac
When Can We Answer Queries Using Result-Bounded Data Interfaces?
We consider answering queries where the underlying data is available only
over limited interfaces which provide lookup access to the tuples matching a
given binding, but possibly restricting the number of output tuples returned.
Interfaces imposing such "result bounds" are common in accessing data via the
web. Given a query over a set of relations as well as some integrity
constraints that relate the queried relations to the data sources, we examine
the problem of deciding if the query is answerable over the interfaces; that
is, whether there exists a plan that returns all answers to the query, assuming
the source data satisfies the integrity constraints.
The first component of our analysis of answerability is a reduction to a
query containment problem with constraints. The second component is a set of
"schema simplification" theorems capturing limitations on how interfaces with
result bounds can be useful to obtain complete answers to queries. These
results also help to show decidability for the containment problem that
captures answerability, for many classes of constraints. The final component in
our analysis of answerability is a "linearization" method, showing that query
containment with certain guarded dependencies -- including those that emerge
from answerability problems -- can be reduced to query containment for a
well-behaved class of linear dependencies. Putting these components together,
we get a detailed picture of how to check answerability over result-bounded
services.Comment: 45 pages, 2 tables, 43 references. Complete version with proofs of
the PODS'18 paper. The main text of this paper is almost identical to the
PODS'18 except that we have fixed some small mistakes. Relative to the
earlier arXiv version, many errors were corrected, and some terminology has
change
On the Complexity of Existential Positive Queries
We systematically investigate the complexity of model checking the
existential positive fragment of first-order logic. In particular, for a set of
existential positive sentences, we consider model checking where the sentence
is restricted to fall into the set; a natural question is then to classify
which sentence sets are tractable and which are intractable. With respect to
fixed-parameter tractability, we give a general theorem that reduces this
classification question to the corresponding question for primitive positive
logic, for a variety of representations of structures. This general theorem
allows us to deduce that an existential positive sentence set having bounded
arity is fixed-parameter tractable if and only if each sentence is equivalent
to one in bounded-variable logic. We then use the lens of classical complexity
to study these fixed-parameter tractable sentence sets. We show that such a set
can be NP-complete, and consider the length needed by a translation from
sentences in such a set to bounded-variable logic; we prove superpolynomial
lower bounds on this length using the theory of compilability, obtaining an
interesting type of formula size lower bound. Overall, the tools, concepts, and
results of this article set the stage for the future consideration of the
complexity of model checking on more expressive logics
- …