Search CORE

7 research outputs found

Data Provenance Inference in Logic Programming: Reducing Effort of Instance-driven Debugging

Author: Huq Mohammad Rezwanul
Mileo Alessandra
Wombacher Andreas
Publication venue: University of Twente, Centre for Telematica and Information Technology (CTIT)
Publication date: 01/01/2013
Field of study

Data provenance allows scientists in different domains validating their models and algorithms to find out anomalies and unexpected behaviors. In previous works, we described on-the-fly interpretation of (Python) scripts to build workflow provenance graph automatically and then infer fine-grained provenance information based on the workflow provenance graph and the availability of data. To broaden the scope of our approach and demonstrate its viability, in this paper we extend it beyond procedural languages, to be used for purely declarative languages such as logic programming under the stable model semantics. For experiments and validation, we use the Answer Set Programming solver oClingo, which makes it possible to formulate and solve stream reasoning problems in a purely declarative fashion. We demonstrate how the benefits of the provenance inference over the explicit provenance still holds in a declarative setting, and we briefly discuss the potential impact for declarative programming, in particular for instance-driven debugging of the model in declarative problem solving

University of Twente Research Information

From scripts towards provenance inference

Author: Apers Peter M.G.
Huq Mohammad Rezwanul
van Beek Ludovicus P.H.
Wada Yoshihide
Wombacher Andreas
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/10/2012
Field of study

Scientists require provenance information either to validate their model or to investigate the origin of an unexpected value. However, they do not maintain any provenance information and even designing the processing workflow is rare in practice. Therefore, in this paper, we propose a solution that can build the workflow provenance graph by interpreting the scripts used for actual processing. Further, scientists can request fine-grained provenance information facilitating the inferred workflow provenance.We also provide a guideline to customize the workflow provenance graph based on user preferences. Our evaluation shows that the proposed approach is relevant and suitable for scientists to manage provenance

Crossref

University of Twente Research Information

Probabilistic inference of fine-grained data provenance

Author: Apers P.M.G.
Huq Mohammad Rezwanul
Wombacher A.
Publication venue: Springer
Publication date: 01/01/2012
Field of study

Decision making, process control and e-science applications process stream data, mostly produced by sensors. To control and monitor these applications, reproducibility of result is a vital requirement. However, it requires massive amount of storage space to store fine-grained provenance data especially for those transformations with overlapping sliding windows. In this paper, we propose a probabilistic technique to infer fine-grained provenance which can also estimate the accuracy beforehand. Our evaluation shows that the probabilistic inference technique achieves same level of accuracy as the other approaches do, with minimal prior knowledge

Crossref

University of Twente Research Information

Adaptive Inference of Fine-grained Data Provenance to Achieve High Accuracy at Lower Storage Costs

Author: Apers Peter M.G.
Huq Mohammad Rezwanul
Wombacher Andreas
Publication venue: IEEE Computer Society
Publication date: 01/01/2011
Field of study

In stream data processing, data arrives continuously and is processed by decision making, process control and e-science applications. To control and monitor these applications, reproducibility of result is a vital requirement. However, it requires massive amount of storage space to store fine-grained provenance data especially for those transformations with overlapping sliding windows. In this paper, we propose techniques which can significantly reduce storage costs and can achieve high accuracy. Our evaluation shows that adaptive inference technique can achieve almost 100% accurate provenance information for a given dataset at lower storage costs than the other techniques. Moreover, we present a guideline about the usage of different provenance collection techniques described in this paper based on the transformation operation and stream characteristics

Crossref

University of Twente Research Information

An inference-based framework to manage data provenance in geoscience applications

Author: Apers Peter M.G.
Huq Mohammad Rezwanul
Wombacher Andreas
Publication venue: IEEE Geoscience and Remote Sensing Society
Publication date: 01/01/2013
Field of study

Data provenance allows scientists to validate their model as well as to investigate the origin of an unexpected value. Furthermore, it can be used as a replication recipe for output data products. However, capturing provenance requires enormous effort by scientists in terms of time and training. First, they need to design the workflow of the scientific model, i.e., workflow provenance, which requires both time and training. However, in practice, scientists may not document any workflow provenance before the model execution due to the lack of time and training. Second, they need to capture provenance while the model is running, i.e., fine-grained data provenance. Explicit documentation of fine-grained provenance is not feasible because of the massive storage consumption by provenance data in the applications, including those from the geoscience domain where data are continuously arriving and are processed. In this paper, we propose an inference-based framework, which provides both workflow and fine-grained data provenance at a minimal cost in terms of time, training, and disk consumption. Our proposed framework is applicable to any given scientific model, and is capable of handling different model dynamics, such as variation in the processing time as well as input data products arrival pattern. Our evaluation of the framework in a real use case with geospatial data shows that the proposed framework is relevant and suitable for scientists in geoscientific domain

Crossref

University of Twente Research Information

Inferring Fine-Grained Data Provenance in Stream Data Processing: Reduced Storage Cost, High Accuracy

Author: Apers Peter M.G.
Huq Mohammad Rezwanul
Wombacher Andreas
Publication venue: Springer Verlag
Publication date: 01/01/2011
Field of study

Fine-grained data provenance ensures reproducibility of results in decision making, process control and e-science applications. However, maintaining this provenance is challenging in stream data processing because of its massive storage consumption, especially with large overlapping sliding windows. In this paper, we propose an approach to infer fine-grained data provenance by using a temporal data model and coarse-grained data provenance of the processing. The approach has been evaluated on a real dataset and the result shows that our proposed inferring method provides provenance information as accurate as explicit fine-grained provenance at reduced storage consumption

University of Twente Research Information

ProvenanceCurious: a tool to infer data provenance from scripts

Author: Apers Peter M.G.
Huq Mohammad Rezwanul
Wombacher Andreas
Publication venue: ACM
Publication date: 01/01/2013
Field of study

The increasing data volume and highly complex models used in different domains make it difficult to debug models in cases of anomalies. Data provenance provides scientists sufficient information to investigate their models. In this paper, we propose a tool which can infer fine-grained data provenance based on a given script. The tool is demonstrated using a hydrological model. The tool is also tested success-fully handling other scripts in different contexts

CiteSeerX

Crossref

University of Twente Research Information