189 research outputs found
Semiring Provenance for Lightweight Description Logics
We investigate semiring provenance--a successful framework originally defined
in the relational database setting--for description logics. In this context,
the ontology axioms are annotated with elements of a commutative semiring and
these annotations are propagated to the ontology consequences in a way that
reflects how they are derived. We define a provenance semantics for a language
that encompasses several lightweight description logics and show its
relationships with semantics that have been defined for ontologies annotated
with a specific kind of annotation (such as fuzzy degrees). We show that under
some restrictions on the semiring, the semantics satisfies desirable properties
(such as extending the semiring provenance defined for databases). We then
focus on the well-known why-provenance, which allows to compute the semiring
provenance for every additively and multiplicatively idempotent commutative
semiring, and for which we study the complexity of problems related to the
provenance of an axiom or a conjunctive query answer. Finally, we consider two
more restricted cases which correspond to the so-called positive Boolean
provenance and lineage in the database setting. For these cases, we exhibit
relationships with well-known notions related to explanations in description
logics and complete our complexity analysis. As a side contribution, we provide
conditions on an ELHI_bot ontology that guarantee tractable reasoning.Comment: Paper currently under review. 102 page
On the Limitations of Provenance for Queries With Difference
The annotation of the results of database transformations was shown to be
very effective for various applications. Until recently, most works in this
context focused on positive query languages. The provenance semirings is a
particular approach that was proven effective for these languages, and it was
shown that when propagating provenance with semirings, the expected equivalence
axioms of the corresponding query languages are satisfied. There have been
several attempts to extend the framework to account for relational algebra
queries with difference. We show here that these suggestions fail to satisfy
some expected equivalence axioms (that in particular hold for queries on
"standard" set and bag databases). Interestingly, we show that this is not a
pitfall of these particular attempts, but rather every such attempt is bound to
fail in satisfying these axioms, for some semirings. Finally, we show
particular semirings for which an extension for supporting difference is
(im)possible.Comment: TAPP 201
Provenance for Aggregate Queries
We study in this paper provenance information for queries with aggregation.
Provenance information was studied in the context of various query languages
that do not allow for aggregation, and recent work has suggested to capture
provenance by annotating the different database tuples with elements of a
commutative semiring and propagating the annotations through query evaluation.
We show that aggregate queries pose novel challenges rendering this approach
inapplicable. Consequently, we propose a new approach, where we annotate with
provenance information not just tuples but also the individual values within
tuples, using provenance to describe the values computation. We realize this
approach in a concrete construction, first for "simple" queries where the
aggregation operator is the last one applied, and then for arbitrary (positive)
relational algebra queries with aggregation; the latter queries are shown to be
more challenging in this context. Finally, we use aggregation to encode queries
with difference, and study the semantics obtained for such queries on
provenance annotated databases
First-Order Provenance Games
We propose a new model of provenance, based on a game-theoretic approach to
query evaluation. First, we study games G in their own right, and ask how to
explain that a position x in G is won, lost, or drawn. The resulting notion of
game provenance is closely related to winning strategies, and excludes from
provenance all "bad moves", i.e., those which unnecessarily allow the opponent
to improve the outcome of a play. In this way, the value of a position is
determined by its game provenance. We then define provenance games by viewing
the evaluation of a first-order query as a game between two players who argue
whether a tuple is in the query answer. For RA+ queries, we show that game
provenance is equivalent to the most general semiring of provenance polynomials
N[X]. Variants of our game yield other known semirings. However, unlike
semiring provenance, game provenance also provides a "built-in" way to handle
negation and thus to answer why-not questions: In (provenance) games, the
reason why x is not won, is the same as why x is lost or drawn (the latter is
possible for games with draws). Since first-order provenance games are
draw-free, they yield a new provenance model that combines how- and why-not
provenance
Provenance for SPARQL queries
Determining trust of data available in the Semantic Web is fundamental for
applications and users, in particular for linked open data obtained from SPARQL
endpoints. There exist several proposals in the literature to annotate SPARQL
query results with values from abstract models, adapting the seminal works on
provenance for annotated relational databases. We provide an approach capable
of providing provenance information for a large and significant fragment of
SPARQL 1.1, including for the first time the major non-monotonic constructs
under multiset semantics. The approach is based on the translation of SPARQL
into relational queries over annotated relations with values of the most
general m-semiring, and in this way also refuting a claim in the literature
that the OPTIONAL construct of SPARQL cannot be captured appropriately with the
known abstract models.Comment: 22 pages, extended version of the ISWC 2012 paper including proof
Snapshot Semantics for Temporal Multiset Relations (Extended Version)
Snapshot semantics is widely used for evaluating queries over temporal data:
temporal relations are seen as sequences of snapshot relations, and queries are
evaluated at each snapshot. In this work, we demonstrate that current
approaches for snapshot semantics over interval-timestamped multiset relations
are subject to two bugs regarding snapshot aggregation and bag difference. We
introduce a novel temporal data model based on K-relations that overcomes these
bugs and prove it to correctly encode snapshot semantics. Furthermore, we
present an efficient implementation of our model as a database middleware and
demonstrate experimentally that our approach is competitive with native
implementations and significantly outperforms such implementations on queries
that involve aggregation.Comment: extended version of PVLDB pape
Provenance Circuits for Trees and Treelike Instances (Extended Version)
Query evaluation in monadic second-order logic (MSO) is tractable on trees
and treelike instances, even though it is hard for arbitrary instances. This
tractability result has been extended to several tasks related to query
evaluation, such as counting query results [3] or performing query evaluation
on probabilistic trees [10]. These are two examples of the more general problem
of computing augmented query output, that is referred to as provenance. This
article presents a provenance framework for trees and treelike instances, by
describing a linear-time construction of a circuit provenance representation
for MSO queries. We show how this provenance can be connected to the usual
definitions of semiring provenance on relational instances [20], even though we
compute it in an unusual way, using tree automata; we do so via intrinsic
definitions of provenance for general semirings, independent of the operational
details of query evaluation. We show applications of this provenance to capture
existing counting and probabilistic results on trees and treelike instances,
and give novel consequences for probability evaluation.Comment: 48 pages. Presented at ICALP'1
Faster query answering in probalistic databases using read-once functions
A boolean expression is in read-once form if each of its variables appears exactly once. When the variables denote independent events in a probability space, the probability of the event denoted by the whole expression in read-once form can be computed in polynomial time (whereas the general problem for arbitrary expressions is #P-complete). Known approaches to checking read-once property seem to require putting these expressions in disjunctive normal form. In this paper, we tell a better story for a large subclass of boolean event expressions: those that are generated by conjunctive queries without self-joins and on tuple-independent probabilistic databases. We first show that given a tuple-independent representation and the provenance graph of an SPJ query plan without self-joins, we can, without using the DNF of a result event expression, efficiently compute its co-occurrence graph. From this, the read-once form can already, if it exists, be computed efficiently using existing techniques. Our second and key contribution is a complete, efficient, and simple to implement algorithm for computing the read-once forms (whenever they exist) directly, using a new concept, that of co-table graph, which can be significantly smaller than the cooccurrence graph
- …