843 research outputs found
When Can We Answer Queries Using Result-Bounded Data Interfaces?
We consider answering queries on data available through access methods, that
provide lookup access to the tuples matching a given binding. Such interfaces
are common on the Web; further, they often have bounds on how many results they
can return, e.g., because of pagination or rate limits. We thus study
result-bounded methods, which may return only a limited number of tuples. We
study how to decide if a query is answerable using result-bounded methods,
i.e., how to compute a plan that returns all answers to the query using the
methods, assuming that the underlying data satisfies some integrity
constraints. We first show how to reduce answerability to a query containment
problem with constraints. Second, we show "schema simplification" theorems
describing when and how result bounded services can be used. Finally, we use
these theorems to give decidability and complexity results about answerability
for common constraint classes.Comment: 65 pages; journal version of the PODS'18 paper arXiv:1706.0793
When Can We Answer Queries Using Result-Bounded Data Interfaces?
We consider answering queries where the underlying data is available only
over limited interfaces which provide lookup access to the tuples matching a
given binding, but possibly restricting the number of output tuples returned.
Interfaces imposing such "result bounds" are common in accessing data via the
web. Given a query over a set of relations as well as some integrity
constraints that relate the queried relations to the data sources, we examine
the problem of deciding if the query is answerable over the interfaces; that
is, whether there exists a plan that returns all answers to the query, assuming
the source data satisfies the integrity constraints.
The first component of our analysis of answerability is a reduction to a
query containment problem with constraints. The second component is a set of
"schema simplification" theorems capturing limitations on how interfaces with
result bounds can be useful to obtain complete answers to queries. These
results also help to show decidability for the containment problem that
captures answerability, for many classes of constraints. The final component in
our analysis of answerability is a "linearization" method, showing that query
containment with certain guarded dependencies -- including those that emerge
from answerability problems -- can be reduced to query containment for a
well-behaved class of linear dependencies. Putting these components together,
we get a detailed picture of how to check answerability over result-bounded
services.Comment: 45 pages, 2 tables, 43 references. Complete version with proofs of
the PODS'18 paper. The main text of this paper is almost identical to the
PODS'18 except that we have fixed some small mistakes. Relative to the
earlier arXiv version, many errors were corrected, and some terminology has
change
Queries with Guarded Negation (full version)
A well-established and fundamental insight in database theory is that
negation (also known as complementation) tends to make queries difficult to
process and difficult to reason about. Many basic problems are decidable and
admit practical algorithms in the case of unions of conjunctive queries, but
become difficult or even undecidable when queries are allowed to contain
negation. Inspired by recent results in finite model theory, we consider a
restricted form of negation, guarded negation. We introduce a fragment of SQL,
called GN-SQL, as well as a fragment of Datalog with stratified negation,
called GN-Datalog, that allow only guarded negation, and we show that these
query languages are computationally well behaved, in terms of testing query
containment, query evaluation, open-world query answering, and boundedness.
GN-SQL and GN-Datalog subsume a number of well known query languages and
constraint languages, such as unions of conjunctive queries, monadic Datalog,
and frontier-guarded tgds. In addition, an analysis of standard benchmark
workloads shows that most usage of negation in SQL in practice is guarded
negation
Querying Schemas With Access Restrictions
We study verification of systems whose transitions consist of accesses to a
Web-based data-source. An access is a lookup on a relation within a relational
database, fixing values for a set of positions in the relation. For example, a
transition can represent access to a Web form, where the user is restricted to
filling in values for a particular set of fields. We look at verifying
properties of a schema describing the possible accesses of such a system. We
present a language where one can describe the properties of an access path, and
also specify additional restrictions on accesses that are enforced by the
schema. Our main property language, AccLTL, is based on a first-order extension
of linear-time temporal logic, interpreting access paths as sequences of
relational structures. We also present a lower-level automaton model,
Aautomata, which AccLTL specifications can compile into. We show that AccLTL
and A-automata can express static analysis problems related to "querying with
limited access patterns" that have been studied in the database literature in
the past, such as whether an access is relevant to answering a query, and
whether two queries are equivalent in the accessible data they can return. We
prove decidability and complexity results for several restrictions and variants
of AccLTL, and explain which properties of paths can be expressed in each
restriction.Comment: VLDB201
Semantic Optimization of Conjunctive Queries
This work deals with the problem of semantic optimization of the central class of conjunctive queries (CQs). Since CQ evaluation is NP-complete, a long line of research has focussed on identifying fragments of CQs that can be efficiently evaluated. One of the most general restrictions corresponds to generalized hypetreewidth bounded by a fixed constant k â„ 1; the associated fragment is denoted GHWk. A CQ is semantically in GHWk if it is equivalent to a CQ in GHWk. The problem of checking whether a CQ is semantically in GHWk has been studied in the constraint-free case, and it has been shown to be NP-complete. However, in case the database is subject to constraints such as tuple-generating dependencies (TGDs) that can express, e.g., inclusion dependencies, or equality-generating dependencies (EGDs) that capture, e.g., key dependencies, a CQ may turn out to be semantically in GHWk under the constraints, while not being semantically in GHWk without the constraints. This opens avenues to new query optimization techniques. In this article, we initiate and develop the theory of semantic optimization of CQs under constraints. More precisely, we study the following natural problem: Given a CQ and a set of constraints, is the query semantically in GHWk, for a fixed k â„ 1, under the constraints, or, in other words, is the query equivalent to one that belongs to GHWk over all those databases that satisfy the constraints? We show that, contrary to what one might expect, decidability of CQ containment is a necessary but not a sufficient condition for the decidability of the problem in question. In particular, we show that checking whether a CQ is semantically in GHW1 is undecidable in the presence of full TGDs (i.e., Datalog rules) or EGDs. In view of the above negative results, we focus on the main classes of TGDs for which CQ containment is decidable and that do not capture the class of full TGDs, i.e., guarded, non-recursive, and sticky sets of TGDs, and show that the problem in question is decidable, while its complexity coincides with the complexity of CQ containment. We also consider key dependencies over unary and binary relations, and we show that the problem in question is decidable in elementary time. Furthermore, we investigate whether being semantically in GHWk alleviates the cost of query evaluation. Finally, in case a CQ is not semantically in GHWk, we discuss how it can be approximated via a CQ that falls in GHWk in an optimal way. Such approximations might help finding âquickâ answers to the input query when exact evaluation is intractable
Inductive Logic Programming in Databases: from Datalog to DL+log
In this paper we address an issue that has been brought to the attention of
the database community with the advent of the Semantic Web, i.e. the issue of
how ontologies (and semantics conveyed by them) can help solving typical
database problems, through a better understanding of KR aspects related to
databases. In particular, we investigate this issue from the ILP perspective by
considering two database problems, (i) the definition of views and (ii) the
definition of constraints, for a database whose schema is represented also by
means of an ontology. Both can be reformulated as ILP problems and can benefit
from the expressive and deductive power of the KR framework DL+log. We
illustrate the application scenarios by means of examples. Keywords: Inductive
Logic Programming, Relational Databases, Ontologies, Description Logics, Hybrid
Knowledge Representation and Reasoning Systems. Note: To appear in Theory and
Practice of Logic Programming (TPLP).Comment: 30 pages, 3 figures, 2 tables
Semantic Acyclicity Under Constraints
A conjunctive query (CQ) is semantically acyclic if it is equivalent to an acyclic one. Semantic acyclicity has been studied in the constraint-free case, and deciding whether a query enjoys this property is NP-complete. However, in case the database is subject to constraints such as tuple-generating dependencies (tgds) that can express, e.g., inclusion dependencies, or equality-generating dependencies (egds) that capture, e.g., functional dependencies, a CQ may turn out to be semantically acyclic under the constraints while not semantically acyclic in general. This opens avenues to new query optimization techniques. In this paper we initiate and develop the theory of semantic acyclicity under constraints. More precisely, we study the following natural problem: Given a CQ and a set of constraints, is the query semantically acyclic under the constraints, or, in other words, is the query equivalent to an acyclic one over all those databases that satisfy the set of constraints? We show that, contrary to what one might expect, decidability of CQ containment is a necessary but not sufficient condition for the decidability of semantic acyclicity. In particular, we show that semantic acyclicity is undecidable in presence of full tgds (i.e., Datalog rules). In view of this fact, we focus on the main classes of tgds for which CQ containment is decidable, and do not capture the class of full tgds, namely guarded, non-recursive and sticky tgds. For these classes we show that semantic acyclicity is decidable, and its complexity coincides with the complexity of CQ containment. In the case of egds, we show that semantic acyclicity is undecidable even over unary and binary predicates. When restricted to keys the problem becomes decidable (NP-complete) over such schemas. We finally consider the problem of evaluating a semantically acyclic query over a database that satisfies a set of constraints. For guarded tgds the evaluation problem is tractable. © Association Computing for Machiner
Global Numerical Constraints on Trees
We introduce a logical foundation to reason on tree structures with
constraints on the number of node occurrences. Related formalisms are limited
to express occurrence constraints on particular tree regions, as for instance
the children of a given node. By contrast, the logic introduced in the present
work can concisely express numerical bounds on any region, descendants or
ancestors for instance. We prove that the logic is decidable in single
exponential time even if the numerical constraints are in binary form. We also
illustrate the usage of the logic in the description of numerical constraints
on multi-directional path queries on XML documents. Furthermore, numerical
restrictions on regular languages (XML schemas) can also be concisely described
by the logic. This implies a characterization of decidable counting extensions
of XPath queries and XML schemas. Moreover, as the logic is closed under
negation, it can thus be used as an optimal reasoning framework for testing
emptiness, containment and equivalence
- âŠ