Search CORE

11 research outputs found

Causality and the semantics of provenance

Provenance, or information about the sources, derivation, custody or history of data, has been studied recently in a number of contexts, including databases, scientific workflows and the Semantic Web. Many provenance mechanisms have been developed, motivated by informal notions such as influence, dependence, explanation and causality. However, there has been little study of whether these mechanisms formally satisfy appropriate policies or even how to formalize relevant motivating concepts such as causality. We contend that mathematical models of these concepts are needed to justify and compare provenance techniques. In this paper we review a theory of causality based on structural models that has been developed in artificial intelligence, and describe work in progress on a causal semantics for provenance graphs.Comment: Workshop submissio

arXiv.org e-Print Archive

Crossref

Directory of Open Access Journals

Re-thinking Workflow Provenance against Data-Oriented Investigation Lifecycle

Author: Alper Pinar
Publication venue: No publisher name
Publication date: 06/05/2014
Field of study

The University of Manchester - Institutional Repository

Distributed storage and queryng techniques for a semantic web of scientific workflow provenance

Author: Navarro Jaime Alberto
Publication venue: ScholarWorks @ UTRGV
Publication date: 01/08/2010
Field of study

In scientific workflow environments, scientists depend on provenance, which records the history of an experiment. Resource Description Framework is frequently used to represent provenance based on vocabularies such as the Open Provenance Model. For complex scientific workflows that generate large amounts of RDF triples, single-machine provenance management becomes inadequate over time. In this thesis, we research how HBase capabilities can be leveraged for distributed storage and querying of provenance data represented in RDF. We architect the ProvBase system that incorporates an HBase/Hadoop backend, propose a storage schema to hold provenance triples, and design querying algorithms to evaluate SPARQL queries in the system. We conduct an experimental study to show the feasibility of our approach

Scholarworks@UTRGV Univ. of Texas RioGrande Valley

Revealing the Detailed Lineage of Script Outputs Using Hybrid Provenance

Author: Cao Yang
Jones Christopher
Jones Matthew B.
Ludäscher Bertram
McPhillips Timothy
Missier Paolo
Slaughter Peter
Thavasimani Priyaa
Vu Duc
Wang Qiwen
Zhang Qian
Publication venue: 'Edinburgh University Library'
Publication date: 13/08/2018
Field of study

We illustrate how combining retrospective and prospectiveprovenance can yield scientifically meaningful hybrid provenancerepresentations of the computational histories of data produced during a script run. We use scripts from multiple disciplines (astrophysics, climate science, biodiversity data curation, and social network analysis), implemented in Python, R, and MATLAB, to highlight the usefulness of diverse forms of retrospectiveprovenance when coupled with prospectiveprovenance. Users provide prospective provenance, i.e., the conceptual workflows latent in scripts, via simple YesWorkflow annotations, embedded as script comments. Runtime observables can be linked to prospective provenance via relational views and queries. These observables could be found hidden in filenames or folder structures, be recorded in log files, or they can be automatically captured using tools such as noWorkflow or the DataONE RunManagers. The YesWorkflow toolkit, example scripts, and demonstration code are available via an open source repository

University of Birmingham Research Portal

The University of Manchester - Institutional Repository

International Journal of Digital Curation

Data lineage model for taverna workflows with lightweight annotation requirements

Author: Belhajjame Khalid
Goble Carole
Missier Paolo
Roos Marco
Zhao Jun
Publication venue: Springer Verlag
Publication date: 01/01/2008
Field of study

The provenance, or lineage, of a workflow data product can be reconstructed by keeping a complete trace of workflow execution. This lineage information, however, is likely to be both imprecise, because of the black-box nature of the services that compose the workflow, and noisy, because of the many trivial data transformations that obscure the intended purpose of the workflow. In this paper we argue that these shortcomings can be alleviated by introducing a small set of optional lightweight annotations to the workflow, in a principled way. We begin by presenting a baseline, annotation-free lineage model for the Taverna workflow system, and then show how the proposed annotations improve the results of fundamental lineage queries.</p

University of Birmingham Research Portal

Data lineage model for taverna workflows with lightweight annotation requirements

Author: Belhajjame Khalid
Goble Carole
Missier Paolo
Roos Marco
Zhao Jun
Publication venue: Springer Verlag
Publication date: 01/01/2008
Field of study

University of Birmingham Research Portal

Data lineage model for Taverna workflows with lightweight annotation requirements

Author: A. Chapman
D. Hull
I. Altintas
J. Hidders
Khalid Belhajjame
O. Benjelloun
P. Buneman
P. Buneman
S. Miles
S. Miles
W. Chiew Tan
Y.L. Simmhan
Publication venue
Publication date: 01/01/2008
Field of study

annotation requirement

CiteSeerX

Crossref

University of Birmingham Research Portal

The University of Manchester - Institutional Repository

Lancaster E-Prints

Data Lineage Model for Taverna Workflows with Lightweight Annotation Requirements

Author: Missier P
Publication venue: 'Springer Fachmedien Wiesbaden GmbH'
Publication date
Field of study

Newcastle University E-Prints

Data quality evaluation through data quality rules and data provenance.

Author: Zanzi Antonella
Publication venue: Italy
Publication date
Field of study

The application and exploitation of large amounts of data play an ever-increasing role in today’s research, government, and economy. Data understanding and decision making heavily rely on high quality data; therefore, in many different contexts, it is important to assess the quality of a dataset in order to determine if it is suitable to be used for a specific purpose. Moreover, as the access to and the exchange of datasets have become easier and more frequent, and as scientists increasingly use the World Wide Web to share scientific data, there is a growing need to know the provenance of a dataset (i.e., information about the processes and data sources that lead to its creation) in order to evaluate its trustworthiness. In this work, data quality rules and data provenance are used to evaluate the quality of datasets. Concerning the first topic, the applied solution consists in the identification of types of data constraints that can be useful as data quality rules and in the development of a software tool to evaluate a dataset on the basis of a set of rules expressed in the XML markup language. We selected some of the data constraints and dependencies already considered in the data quality field, but we also used order dependencies and existence constraints as quality rules. In addition, we developed some algorithms to discover the types of dependencies used in the tool. To deal with the provenance of data, the Open Provenance Model (OPM) was adopted, an experimental query language for querying OPM graphs stored in a relational database was implemented, and an approach to design OPM graphs was proposed

InsubriaSPACE - Thesis PhD Repository