Search CORE

90 research outputs found

Provenance in scientific workflow systems

Author: Davidson Susan
Freire Juliana
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2007
Field of study

Journal ArticleThe automated tracking and storage of provenance information promises to be a major advantage of scientific workflow systems. We discuss issues related to data and workflow provenance, and present techniques for focusing user attention on meaningful provenance through "user views," for managing the provenance of nested scientific data, and for using information about the evolution of a workflow specification to understand the difference in the provenance of similar data products

Re-thinking Workflow Provenance against Data-Oriented Investigation Lifecycle

Author: Alper Pinar
Publication venue: No publisher name
Publication date: 06/05/2014
Field of study

Permissioned Blockchain for Data Provenance in Scientific Data Management

Author: Fröschle Sibylle
Hahn Axel
Möller Julius
Publication venue: AIS Electronic Library (AISeL)
Publication date: 23/02/2021
Field of study

AIS Electronic Library (AISeL)

Workflow Provenance: from Modeling to Reporting

Author: Ferdous Rayhan 1992-
Publication venue: 'University of Saskatchewan Library'
Publication date: 12/03/2019
Field of study

Workflow provenance is a crucial part of a workflow system as it enables data lineage analysis, error tracking, workflow monitoring, usage pattern discovery, and so on. Integrating provenance into a workflow system or modifying a workflow system to capture or analyze different provenance information is burdensome, requiring extensive development because provenance mechanisms rely heavily on the modelling, architecture, and design of the workflow system. Various tools and technologies exist for logging events in a software system. Unfortunately, logging tools and technologies are not designed for capturing and analyzing provenance information. Workflow provenance is not only about logging, but also about retrieving workflow related information from logs. In this work, we propose a taxonomy of provenance questions and guided by these questions, we created a workflow programming model 'ProvMod' with a supporting run-time library to provide automated provenance and log analysis for any workflow system. The design and provenance mechanism of ProvMod is based on recommendations from prominent research and is easy to integrate into any workflow system. ProvMod offers Neo4j graph database support to manage semi-structured heterogeneous JSON logs. The log structure is adaptable to any NoSQL technology. For each provenance question in our taxonomy, ProvMod provides the answer with data visualization using Neo4j and the ELK Stack. Besides analyzing performance from various angles, we demonstrate the ease of integration by integrating ProvMod with Apache Taverna and evaluate ProvMod usability by engaging users. Finally, we present two Software Engineering research cases (clone detection and architecture extraction) where our proposed model ProvMod and provenance questions taxonomy can be applied to discover meaningful insights

eCommons@USASK

University of Saskatchewan Research Archive

An optimized workflow enactor for data-intensive grid applications

Author: Glatard Tristan
Montagnat Johan
Pennec Xavier
Publication venue: HAL CCSD
Publication date: 01/10/2005
Field of study

I3S laboratory Research Report (I3S/RR-2005-32-FR), Sophia Antipolis, FranceData-intensive applications benefit from an intrinsic data parallelism that should be exploited on parallel systems to lower execution time. In the last years, data grids have been developed to handle, process, and analyze the tremendous amount of data produced in many scientific areas. Although very large, these grid infrastructures are under heavy use and efficiency is of utmost importance. This paper deals with the optimization of workflow managers used for deploying complex data-driven applications on grids. In that kind of application, we show how to better exploit data parallelism than currently done in most existing workflow managers. We present the design of a prototype implementing our solution and we show that it provides a significant speed-up w.r.t existing solutions by exemplifying results on a realistic medical imaging application

INRIA a CCSD electronic archive server