5,471 research outputs found
Provenance for Aggregate Queries
We study in this paper provenance information for queries with aggregation.
Provenance information was studied in the context of various query languages
that do not allow for aggregation, and recent work has suggested to capture
provenance by annotating the different database tuples with elements of a
commutative semiring and propagating the annotations through query evaluation.
We show that aggregate queries pose novel challenges rendering this approach
inapplicable. Consequently, we propose a new approach, where we annotate with
provenance information not just tuples but also the individual values within
tuples, using provenance to describe the values computation. We realize this
approach in a concrete construction, first for "simple" queries where the
aggregation operator is the last one applied, and then for arbitrary (positive)
relational algebra queries with aggregation; the latter queries are shown to be
more challenging in this context. Finally, we use aggregation to encode queries
with difference, and study the semantics obtained for such queries on
provenance annotated databases
Explain3D: Explaining Disagreements in Disjoint Datasets
Data plays an important role in applications, analytic processes, and many
aspects of human activity. As data grows in size and complexity, we are met
with an imperative need for tools that promote understanding and explanations
over data-related operations. Data management research on explanations has
focused on the assumption that data resides in a single dataset, under one
common schema. But the reality of today's data is that it is frequently
un-integrated, coming from different sources with different schemas. When
different datasets provide different answers to semantically similar questions,
understanding the reasons for the discrepancies is challenging and cannot be
handled by the existing single-dataset solutions.
In this paper, we propose Explain3D, a framework for explaining the
disagreements across disjoint datasets (3D). Explain3D focuses on identifying
the reasons for the differences in the results of two semantically similar
queries operating on two datasets with potentially different schemas. Our
framework leverages the queries to perform a semantic mapping across the
relevant parts of their provenance; discrepancies in this mapping point to
causes of the queries' differences. Exploiting the queries gives Explain3D an
edge over traditional schema matching and record linkage techniques, which are
query-agnostic. Our work makes the following contributions: (1) We formalize
the problem of deriving optimal explanations for the differences of the results
of semantically similar queries over disjoint datasets. (2) We design a 3-stage
framework for solving the optimal explanation problem. (3) We develop a
smart-partitioning optimizer that improves the efficiency of the framework by
orders of magnitude. (4)~We experiment with real-world and synthetic data to
demonstrate that Explain3D can derive precise explanations efficiently
On the Limitations of Provenance for Queries With Difference
The annotation of the results of database transformations was shown to be
very effective for various applications. Until recently, most works in this
context focused on positive query languages. The provenance semirings is a
particular approach that was proven effective for these languages, and it was
shown that when propagating provenance with semirings, the expected equivalence
axioms of the corresponding query languages are satisfied. There have been
several attempts to extend the framework to account for relational algebra
queries with difference. We show here that these suggestions fail to satisfy
some expected equivalence axioms (that in particular hold for queries on
"standard" set and bag databases). Interestingly, we show that this is not a
pitfall of these particular attempts, but rather every such attempt is bound to
fail in satisfying these axioms, for some semirings. Finally, we show
particular semirings for which an extension for supporting difference is
(im)possible.Comment: TAPP 201
Semantic Modeling of Analytic-based Relationships with Direct Qualification
Successfully modeling state and analytics-based semantic relationships of
documents enhances representation, importance, relevancy, provenience, and
priority of the document. These attributes are the core elements that form the
machine-based knowledge representation for documents. However, modeling
document relationships that can change over time can be inelegant, limited,
complex or overly burdensome for semantic technologies. In this paper, we
present Direct Qualification (DQ), an approach for modeling any semantically
referenced document, concept, or named graph with results from associated
applied analytics. The proposed approach supplements the traditional
subject-object relationships by providing a third leg to the relationship; the
qualification of how and why the relationship exists. To illustrate, we show a
prototype of an event-based system with a realistic use case for applying DQ to
relevancy analytics of PageRank and Hyperlink-Induced Topic Search (HITS).Comment: Proceedings of the 2015 IEEE 9th International Conference on Semantic
Computing (IEEE ICSC 2015
- …