261 research outputs found
Towards Inferring Queries from Simple and Partial Provenance Examples
The field of query-by-example aims at inferring queries from output examples
given by non-expert users, by finding the underlying logic that binds the
examples. However, for a very small set of examples, it is difficult to
correctly infer such logic. To bridge this gap, previous work suggested
attaching explanations to each output example, modeled as provenance, allowing
users to explain the reason behind their choice of example. In this paper, we
explore the problem of inferring queries from a few output examples and
intuitive explanations. We propose a two step framework: (1) convert the
explanations into (partial) provenance and (2) infer a query that generates the
output examples using a novel algorithm that employs a graph based approach.
This framework is suitable for non-experts as it does not require the
specification of the provenance in its entirety or an understanding of its
structure. We show promising initial experimental results of our approach
How and Why is An Answer (Still) Correct? Maintaining Provenance in Dynamic Knowledge Graphs
Knowledge graphs (KGs) have increasingly become the backbone of many critical
knowledge-centric applications. Most large-scale KGs used in practice are
automatically constructed based on an ensemble of extraction techniques applied
over diverse data sources. Therefore, it is important to establish the
provenance of results for a query to determine how these were computed.
Provenance is shown to be useful for assigning confidence scores to the
results, for debugging the KG generation itself, and for providing answer
explanations. In many such applications, certain queries are registered as
standing queries since their answers are needed often. However, KGs keep
continuously changing due to reasons such as changes in the source data,
improvements to the extraction techniques, refinement/enrichment of
information, and so on. This brings us to the issue of efficiently maintaining
the provenance polynomials of complex graph pattern queries for dynamic and
large KGs instead of having to recompute them from scratch each time the KG is
updated. Addressing these issues, we present HUKA which uses provenance
polynomials for tracking the derivation of query results over knowledge graphs
by encoding the edges involved in generating the answer. More importantly, HUKA
also maintains these provenance polynomials in the face of updates---insertions
as well as deletions of facts---to the underlying KG. Experimental results over
large real-world KGs such as YAGO and DBpedia with various benchmark SPARQL
query workloads reveals that HUKA can be almost 50 times faster than existing
systems for provenance computation on dynamic KGs
Language-integrated provenance in Haskell
Scientific progress increasingly depends on data management, particularly to
clean and curate data so that it can be systematically analyzed and reused. A
wealth of techniques for managing and curating data (and its provenance) have
been proposed, largely in the database community. In particular, a number of
influential papers have proposed collecting provenance information explaining
where a piece of data was copied from, or what other records were used to
derive it. Most of these techniques, however, exist only as research prototypes
and are not available in mainstream database systems. This means scientists
must either implement such techniques themselves or (all too often) go without.
This is essentially a code reuse problem: provenance techniques currently
cannot be implemented reusably, only as ad hoc, usually unmaintained extensions
to standard databases. An alternative, relatively unexplored approach is to
support such techniques at a higher abstraction level, using metaprogramming or
reflection techniques. Can advanced programming techniques make it easier to
transfer provenance research results into practice?
We build on a recent approach called language-integrated provenance, which
extends language-integrated query techniques with source-to-source query
translations that record provenance. In previous work, a proof of concept was
developed in a research programming language called Links, which supports
sophisticated Web and database programming. In this paper, we show how to adapt
this approach to work in Haskell building on top of the Database-Supported
Haskell (DSH) library.
Even though it seemed clear in principle that Haskell's rich programming
features ought to be sufficient, implementing language-integrated provenance in
Haskell required overcoming a number of technical challenges due to
interactions between these capabilities. Our implementation serves as a proof
of concept showing how this combination of metaprogramming features can, for
the first time, make data provenance facilities available to programmers as a
library in a widely-used, general-purpose language.
In our work we were successful in implementing forms of provenance known as
where-provenance and lineage. We have tested our implementation using a simple
database and query set and established that the resulting queries are executed
correctly on the database. Our implementation is publicly available on GitHub.
Our work makes provenance tracking available to users of DSH at little cost.
Although Haskell is not widely used for scientific database development, our
work suggests which languages features are necessary to support provenance as
library. We also highlight how combining Haskell's advanced type programming
features can lead to unexpected complications, which may motivate further
research into type system expressiveness
- …