30 research outputs found
Beyond Well-designed SPARQL
SPARQL is the standard query language for RDF data. The distinctive feature of SPARQL is the OPTIONAL operator, which allows for partial answers when complete answers are not available due to lack of information. However, optional matching is computationally expensive - query answering is PSPACE-complete. The well-designed fragment of SPARQL achieves much better computational properties by restricting the use of optional matching - query answering becomes coNP-complete. However, well-designed SPARQL captures far from all real-life queries - in fact, only about half of the queries over DBpedia that use OPTIONAL are well-designed.
In the present paper, we study queries outside of well-designed SPARQL. We introduce the class of weakly well-designed queries that subsumes well-designed queries and includes most common meaningful non-well-designed queries: our analysis shows that the new fragment captures about 99% of DBpedia queries with OPTIONAL. At the same time, query answering for weakly well-designed SPARQL remains coNP-complete, and our fragment is in a certain sense maximal for this complexity. We show that the fragment\u27s expressive power is strictly in-between well-designed and full SPARQL. Finally, we provide an intuitive normal form for weakly well-designed queries and study the complexity of containment and equivalence
CONSTRUCT Queries in SPARQL
SPARQL has become the most popular language for querying RDF datasets, the standard data model for representing information in the Web. This query language has received a good deal of attention in the last few years: two versions of W3C standards have been issued, several SPARQL query engines have been deployed, and important theoretical foundations have been laid. However, many fundamental aspects of SPARQL queries are not yet fully understood. To this end, it is crucial to understand the correspondence between SPARQL and well-developed frameworks like relational algebra or first order logic. But one of the main obstacles on the way to such understanding is the fact that the well-studied fragments of SPARQL do not produce RDF as output.
In this paper we embark on the study of SPARQL CONSTRUCT queries, that is, queries which output RDF graphs. This class of queries takes rightful place in the standards and implementations, but contrary to SELECT queries, it has not yet attracted a worth-while theoretical research. Under this framework we are able to establish a strong connection between SPARQL and well-known logical and database formalisms. In particular, the fragment which does not allow for blank nodes in output templates corresponds to first order queries, its well-designed sub-fragment corresponds to positive first order queries, and the general language can be re-stated as a data exchange setting. These correspondences allow us to conclude that the general language is not composable, but the aforementioned blank-free fragments are. Finally, we enrich SPARQL with a recursion operator and establish fundamental properties of this extension
Two Variable Logic with Ultimately Periodic Counting
We consider the extension of FO² with quantifiers that state that the number of elements where a formula holds should belong to a given ultimately periodic set. We show that both satisfiability and finite satisfiability of the logic are decidable. We also show that the spectrum of any sentence is definable in Presburger arithmetic. In the process we present several refinements to the "biregular graph method". In this method, decidability issues concerning two-variable logics are reduced to questions about Presburger definability of integer vectors associated with partitioned graphs, where nodes in a partition satisfy certain constraints on their in- and out-degrees
Stratified Negation in Limit Datalog Programs
There has recently been an increasing interest in declarative data analysis,
where analytic tasks are specified using a logical language, and their
implementation and optimisation are delegated to a general-purpose query
engine. Existing declarative languages for data analysis can be formalised as
variants of logic programming equipped with arithmetic function symbols and/or
aggregation, and are typically undecidable. In prior work, the language of
was proposed, which is sufficiently powerful to
capture many analysis tasks and has decidable entailment problem. Rules in this
language, however, do not allow for negation. In this paper, we study an
extension of limit programs with stratified negation-as-failure. We show that
the additional expressive power makes reasoning computationally more demanding,
and provide tight data complexity bounds. We also identify a fragment with
tractable data complexity and sufficient expressivity to capture many relevant
tasks.Comment: 14 pages; full version of a paper accepted at IJCAI-1
Foundations of Declarative Data Analysis Using Limit Datalog Programs
Motivated by applications in declarative data analysis, we study
---an extension of positive Datalog with
arithmetic functions over integers. This language is known to be undecidable,
so we propose two fragments. In
predicates are axiomatised to keep minimal/maximal numeric values, allowing us
to show that fact entailment is coNExpTime-complete in combined, and
coNP-complete in data complexity. Moreover, an additional
requirement causes the complexity to drop to ExpTime and PTime, respectively.
Finally, we show that stable can express many
useful data analysis tasks, and so our results provide a sound foundation for
the development of advanced information systems.Comment: 23 pages; full version of a paper accepted at IJCAI-17; v2 fixes some
typos and improves the acknowledgment
Classification of annotation semirings over containment of conjunctive queries
Funding: This work is supported under SOCIAM: The Theory and Practice of Social Machines, a project funded by the UK Engineering and Physical Sciences Research Council (EPSRC) under grant number EP/J017728/1. This work was also supported by FET-Open Project FoX, grant agreement 233599; EPSRC grants EP/F028288/1, G049165 and J015377; and the Laboratory for Foundations of Computer Science.We study the problem of query containment of conjunctive queries over annotated databases. Annotations are typically attached to tuples and represent metadata, such as probability, multiplicity, comments, or provenance. It is usually assumed that annotations are drawn from a commutative semiring. Such databases pose new challenges in query optimization, since many related fundamental tasks, such as query containment, have to be reconsidered in the presence of propagation of annotations. We axiomatize several classes of semirings for each of which containment of conjunctive queries is equivalent to existence of a particular type of homomorphism. For each of these types, we also specify all semirings for which existence of a corresponding homomorphism is a sufficient (or necessary) condition for the containment. We develop new decision procedures for containment for some semirings which are not in any of these classes. This generalizes and systematizes previous approaches.PostprintPeer reviewe
The Bag Semantics of Ontology-Based Data Access
Ontology-based data access (OBDA) is a popular approach for integrating and
querying multiple data sources by means of a shared ontology. The ontology is
linked to the sources using mappings, which assign views over the data to
ontology predicates. Motivated by the need for OBDA systems supporting
database-style aggregate queries, we propose a bag semantics for OBDA, where
duplicate tuples in the views defined by the mappings are retained, as is the
case in standard databases. We show that bag semantics makes conjunctive query
answering in OBDA coNP-hard in data complexity. To regain tractability, we
consider a rather general class of queries and show its rewritability to a
generalisation of the relational calculus to bags
On the Correspondence Between Monotonic Max-Sum GNNs and Datalog
Although there has been significant interest in applying machine learning
techniques to structured data, the expressivity (i.e., a description of what
can be learned) of such techniques is still poorly understood. In this paper,
we study data transformations based on graph neural networks (GNNs). First, we
note that the choice of how a dataset is encoded into a numeric form
processable by a GNN can obscure the characterisation of a model's
expressivity, and we argue that a canonical encoding provides an appropriate
basis. Second, we study the expressivity of monotonic max-sum GNNs, which cover
a subclass of GNNs with max and sum aggregation functions. We show that, for
each such GNN, one can compute a Datalog program such that applying the GNN to
any dataset produces the same facts as a single round of application of the
program's rules to the dataset. Monotonic max-sum GNNs can sum an unbounded
number of feature vectors which can result in arbitrarily large feature values,
whereas rule application requires only a bounded number of constants. Hence,
our result shows that the unbounded summation of monotonic max-sum GNNs does
not increase their expressive power. Third, we sharpen our result to the
subclass of monotonic max GNNs, which use only the max aggregation function,
and identify a corresponding class of Datalog programs