11 research outputs found
Datalog rewriting for Guarded TGDs
We deal with the problem of fact entailment with respect to a database and a set of integrity constraints, focusing on the case of Guarded tuple-generating dependencies (GTGDs). The original approach to the problem in the literature is via forward reasoning or "chasing", where one completes the input database by adding fresh elements and facts. This completion process may be infinite, but in the case of GTGDs it is known that one can compute a point where the chase can be cut off without missing any base facts. Another approach is by forming an automaton and checking it for emptiness. Neither of these approaches scales to large input datasets. An alternative approach is to rewrite the constraints into Datalog: the Datalog rewriting can be generated in advance of any dataset and will produce the same base facts as the original constraints. It is known that Datalog rewritings always exist. But to our knowledge the approach has never been implemented. In this work we overview effective algorithms to Datalog rewriting of GTGDs. This presents work that will appear in VLDB 2022
Functional Dependencies Unleashed for Scalable Data Exchange
We address the problem of efficiently evaluating target functional
dependencies (fds) in the Data Exchange (DE) process. Target fds naturally
occur in many DE scenarios, including the ones in Life Sciences in which
multiple source relations need to be structured under a constrained target
schema. However, despite their wide use, target fds' evaluation is still a
bottleneck in the state-of-the-art DE engines. Systems relying on an all-SQL
approach typically do not support target fds unless additional information is
provided. Alternatively, DE engines that do include these dependencies
typically pay the price of a significant drop in performance and scalability.
In this paper, we present a novel chase-based algorithm that can efficiently
handle arbitrary fds on the target. Our approach essentially relies on
exploiting the interactions between source-to-target (s-t) tuple-generating
dependencies (tgds) and target fds. This allows us to tame the size of the
intermediate chase results, by playing on a careful ordering of chase steps
interleaving fds and (chosen) tgds. As a direct consequence, we importantly
diminish the fd application scope, often a central cause of the dramatic
overhead induced by target fds. Moreover, reasoning on dependency interaction
further leads us to interesting parallelization opportunities, yielding
additional scalability gains. We provide a proof-of-concept implementation of
our chase-based algorithm and an experimental study aiming at gauging its
scalability with respect to a number of parameters, among which the size of
source instances and the number of dependencies of each tested scenario.
Finally, we empirically compare with the latest DE engines, and show that our
algorithm outperforms them
Existential Rule Languages with Finite Chase: Complexity and Expressiveness
Finite chase, or alternatively chase termination, is an important condition
to ensure the decidability of existential rule languages. In the past few
years, a number of rule languages with finite chase have been studied. In this
work, we propose a novel approach for classifying the rule languages with
finite chase. Using this approach, a family of decidable rule languages, which
extend the existing languages with the finite chase property, are naturally
defined. We then study the complexity of these languages. Although all of them
are tractable for data complexity, we show that their combined complexity can
be arbitrarily high. Furthermore, we prove that all the rule languages with
finite chase that extend the weakly acyclic language are of the same
expressiveness as the weakly acyclic one, while rule languages with higher
combined complexity are in general more succinct than those with lower combined
complexity.Comment: Extended version of a paper to appear on AAAI 201
Towards Universal Languages for Tractable Ontology Mediated Query Answering
An ontology language for ontology mediated query answering (OMQA-language) is
universal for a family of OMQA-languages if it is the most expressive one among
this family. In this paper, we focus on three families of tractable
OMQA-languages, including first-order rewritable languages and languages whose
data complexity of the query answering is in AC0 or PTIME. On the negative
side, we prove that there is, in general, no universal language for each of
these families of languages. On the positive side, we propose a novel property,
the locality, to approximate the first-order rewritability, and show that there
exists a language of disjunctive embedded dependencies that is universal for
the family of OMQA-languages with locality. All of these results apply to OMQA
with query languages such as conjunctive queries, unions of conjunctive queries
and acyclic conjunctive queries.Comment: 10 pages, 1 figure, the full version of a paper accepted for AAAI
2020. Some typos have been correcte
The Impact of Active Domain Predicates on Guarded Existential Rules
It is realistic to assume that a database management system provides access to the active domain via built-in relations. Therefore, databases that include designated predicates that hold the active domain, which we call product databases, form a natural notion that deserves our attention. An important issue then is to look at the consequences of product databases for the expressiveness and complexity of central existential rule languages. We focus on guarded-based existential rules, and we investigate the impact of product databases on their expressive power and complexity. We show that the queries expressed via (frontier-)guarded rules gain in expressiveness, and in fact, they have the same expressive power as Datalog. On the other hand, there is no impact on the expressiveness of the queries specified via weakly-(frontier-)guarded rules since they are powerful enough to explicitly compute the predicates needed to access the active domain. We also observe that there is no impact on the complexity of the query languages in question
The Vadalog System: Datalog-based Reasoning for Knowledge Graphs
Over the past years, there has been a resurgence of Datalog-based systems in
the database community as well as in industry. In this context, it has been
recognized that to handle the complex knowl\-edge-based scenarios encountered
today, such as reasoning over large knowledge graphs, Datalog has to be
extended with features such as existential quantification. Yet, Datalog-based
reasoning in the presence of existential quantification is in general
undecidable. Many efforts have been made to define decidable fragments. Warded
Datalog+/- is a very promising one, as it captures PTIME complexity while
allowing ontological reasoning. Yet so far, no implementation of Warded
Datalog+/- was available. In this paper we present the Vadalog system, a
Datalog-based system for performing complex logic reasoning tasks, such as
those required in advanced knowledge graphs. The Vadalog system is Oxford's
contribution to the VADA research programme, a joint effort of the universities
of Oxford, Manchester and Edinburgh and around 20 industrial partners. As the
main contribution of this paper, we illustrate the first implementation of
Warded Datalog+/-, a high-performance Datalog+/- system utilizing an aggressive
termination control strategy. We also provide a comprehensive experimental
evaluation.Comment: Extended version of VLDB paper
<https://doi.org/10.14778/3213880.3213888
Worst-case Optimal Query Answering for Greedy Sets of Existential Rules and Their Subclasses
The need for an ontological layer on top of data, associated with advanced
reasoning mechanisms able to exploit the semantics encoded in ontologies, has
been acknowledged both in the database and knowledge representation
communities. We focus in this paper on the ontological query answering problem,
which consists of querying data while taking ontological knowledge into
account. More specifically, we establish complexities of the conjunctive query
entailment problem for classes of existential rules (also called
tuple-generating dependencies, Datalog+/- rules, or forall-exists-rules. Our
contribution is twofold. First, we introduce the class of greedy
bounded-treewidth sets (gbts) of rules, which covers guarded rules, and their
most well-known generalizations. We provide a generic algorithm for query
entailment under gbts, which is worst-case optimal for combined complexity with
or without bounded predicate arity, as well as for data complexity and query
complexity. Secondly, we classify several gbts classes, whose complexity was
unknown, with respect to combined complexity (with both unbounded and bounded
predicate arity) and data complexity to obtain a comprehensive picture of the
complexity of existential rule fragments that are based on diverse guardedness
notions. Upper bounds are provided by showing that the proposed algorithm is
optimal for all of them
Expressive Languages for Querying the Semantic Web
The problem of querying RDF data is a central issue for the development of the Semantic Web. The query language SPARQL has become the standard language for querying RDF since its W3C standardization in 2008. However, the 2008 version of this language missed some important functionalities: reasoning capabilities to deal with RDFS and OWL vocabularies, navigational capabilities to exploit the graph structure of RDF data, and a general form of recursion much needed to express some natural queries. To overcome these limitations, a new version of SPARQL, called SPARQL 1.1, was released in 2013, which includes entailment regimes for RDFS and OWL vocabularies, and a mechanism to express navigation patterns through regular expressions. Unfortunately, there are a number of useful navigation patterns that cannot be expressed in SPARQL 1.1, and the language lacks a general mechanism to express recursive queries. To the best of our knowledge, no efficient RDF query language that combines the above functionalities is known. It is the aim of this work to fill this gap. To this end, we focus on a core fragment of the OWL 2 QL profile of OWL 2 and show that every SPARQL query enriched with the above features can be naturally translated into a query expressed in a language that is based on an extension of Datalog, which allows for value invention and stratified negation. However, the query evaluation problem for this language is highly intractable, which is not surprising since it is expressive enough to encode some inherently hard queries. We identify a natural fragment of it, and we show it to be tractable and powerful enough to define SPARQL queries enhanced with the desired functionalities