Search CORE

261 research outputs found

Towards Inferring Queries from Simple and Partial Provenance Examples

Author: Gilad Amir
Moskovitch Yuval
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 20/08/2020
Field of study

The field of query-by-example aims at inferring queries from output examples given by non-expert users, by finding the underlying logic that binds the examples. However, for a very small set of examples, it is difficult to correctly infer such logic. To bridge this gap, previous work suggested attaching explanations to each output example, modeled as provenance, allowing users to explain the reason behind their choice of example. In this paper, we explore the problem of inferring queries from a few output examples and intuitive explanations. We propose a two step framework: (1) convert the explanations into (partial) provenance and (2) infer a query that generates the output examples using a novel algorithm that employs a graph based approach. This framework is suitable for non-experts as it does not require the specification of the provenance in its entirety or an understanding of its structure. We show promising initial experimental results of our approach

arXiv.org e-Print Archive

Crossref

An Annotation Management System for Relational Databases

Author: D BHAGWAT
G VIJAYVARGIYA
L CHITICARIU
W TAN
Publication venue: 'Elsevier BV'
Publication date: 01/01/2007
Field of study

Crossref

How and Why is An Answer (Still) Correct? Maintaining Provenance in Dynamic Knowledge Graphs

Author: Agrawal P.
Arab B. S.
Babu S.
Ives Z. G.
Wishart D. S.
Wylot M.
Publication venue
Publication date: 29/07/2020
Field of study

Knowledge graphs (KGs) have increasingly become the backbone of many critical knowledge-centric applications. Most large-scale KGs used in practice are automatically constructed based on an ensemble of extraction techniques applied over diverse data sources. Therefore, it is important to establish the provenance of results for a query to determine how these were computed. Provenance is shown to be useful for assigning confidence scores to the results, for debugging the KG generation itself, and for providing answer explanations. In many such applications, certain queries are registered as standing queries since their answers are needed often. However, KGs keep continuously changing due to reasons such as changes in the source data, improvements to the extraction techniques, refinement/enrichment of information, and so on. This brings us to the issue of efficiently maintaining the provenance polynomials of complex graph pattern queries for dynamic and large KGs instead of having to recompute them from scratch each time the KG is updated. Addressing these issues, we present HUKA which uses provenance polynomials for tracking the derivation of query results over knowledge graphs by encoding the edges involved in generating the answer. More importantly, HUKA also maintains these provenance polynomials in the face of updates---insertions as well as deletions of facts---to the underlying KG. Experimental results over large real-world KGs such as YAGO and DBpedia with various benchmark SPARQL query workloads reveals that HUKA can be almost 50 times faster than existing systems for provenance computation on dynamic KGs

arXiv.org e-Print Archive

Crossref

Language-integrated provenance in Haskell

Author: Cheney James
Stolarek Jan
Publication venue: 'Aspect-Oriented Software Association (AOSA)'
Publication date: 27/03/2018
Field of study

Scientific progress increasingly depends on data management, particularly to clean and curate data so that it can be systematically analyzed and reused. A wealth of techniques for managing and curating data (and its provenance) have been proposed, largely in the database community. In particular, a number of influential papers have proposed collecting provenance information explaining where a piece of data was copied from, or what other records were used to derive it. Most of these techniques, however, exist only as research prototypes and are not available in mainstream database systems. This means scientists must either implement such techniques themselves or (all too often) go without. This is essentially a code reuse problem: provenance techniques currently cannot be implemented reusably, only as ad hoc, usually unmaintained extensions to standard databases. An alternative, relatively unexplored approach is to support such techniques at a higher abstraction level, using metaprogramming or reflection techniques. Can advanced programming techniques make it easier to transfer provenance research results into practice? We build on a recent approach called language-integrated provenance, which extends language-integrated query techniques with source-to-source query translations that record provenance. In previous work, a proof of concept was developed in a research programming language called Links, which supports sophisticated Web and database programming. In this paper, we show how to adapt this approach to work in Haskell building on top of the Database-Supported Haskell (DSH) library. Even though it seemed clear in principle that Haskell's rich programming features ought to be sufficient, implementing language-integrated provenance in Haskell required overcoming a number of technical challenges due to interactions between these capabilities. Our implementation serves as a proof of concept showing how this combination of metaprogramming features can, for the first time, make data provenance facilities available to programmers as a library in a widely-used, general-purpose language. In our work we were successful in implementing forms of provenance known as where-provenance and lineage. We have tested our implementation using a simple database and query set and established that the resulting queries are executed correctly on the database. Our implementation is publicly available on GitHub. Our work makes provenance tracking available to users of DSH at little cost. Although Haskell is not widely used for scientific database development, our work suggests which languages features are necessary to support provenance as library. We also highlight how combining Haskell's advanced type programming features can lead to unexpected complications, which may motivate further research into type system expressiveness

arXiv.org e-Print Archive

Crossref

Edinburgh Research Explorer