Search CORE

118 research outputs found

The lifecycle of provenance metadata and its associated challenges and opportunities

Author: Missier Paolo
Publication venue
Publication date: 01/01/2016
Field of study

This chapter outlines some of the challenges and opportunities associated with adopting provenance principles and standards in a variety of disciplines, including data publication and reuse, and information sciences

arXiv.org e-Print Archive

University of Birmingham Research Portal

Four level provenance support to achieve portable reproducibility of scientific workflows

Author: Bánáti Anna
Kacsuk Péter
Kozlovszky Miklós
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2015
Field of study

Crossref

SZTAKI Publication Repository

DataHub: Collaborative Data Science & Dataset Version Management at Scale

Author: Bhardwaj Anant
Bhattacherjee Souvik
Chavan Amit
Deshpande Amol
Elmore Aaron J.
Madden Samuel
Parameswaran Aditya G.
Publication venue
Publication date: 02/09/2014
Field of study

Relational databases have limited support for data collaboration, where teams collaboratively curate and analyze large datasets. Inspired by software version control systems like git, we propose (a) a dataset version control system, giving users the ability to create, branch, merge, difference and search large, divergent collections of datasets, and (b) a platform, DataHub, that gives users the ability to perform collaborative data analysis building on this version control system. We outline the challenges in providing dataset version control at scale.Comment: 7 page

arXiv.org e-Print Archive

CiteSeerX

DSpace@MIT

Classification of Scientific Workflows Based on Reproducibility Analysis

Author: Bánáti Anna
Kacsuk Péter
Kozlovszky Miklós
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2016
Field of study

Crossref

SZTAKI Publication Repository

Reproducibility Analysis of Scientific Workflows

Author: Bánáti Anna
Kacsuk Péter
Kozlovszky Miklós
Publication venue: 'Obuda University'
Publication date: 01/01/2017
Field of study

SZTAKI Publication Repository

Minimal sufficient information about the scientific workflows to create reproducible experiment

Author: Bánáti Anna
Kacsuk Péter
Kozlovszky Miklós
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2015
Field of study

SZTAKI Publication Repository

Scientific Workflow Repeatability through Cloud-Aware Provenance

Author: Hasham Khawar
McClatchey Richard
Munir Kamran
Shamdasani Jetendr
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 05/02/2015
Field of study

The transformations, analyses and interpretations of data in scientific workflows are vital for the repeatability and reliability of scientific workflows. This provenance of scientific workflows has been effectively carried out in Grid based scientific workflow systems. However, recent adoption of Cloud-based scientific workflows present an opportunity to investigate the suitability of existing approaches or propose new approaches to collect provenance information from the Cloud and to utilize it for workflow repeatability in the Cloud infrastructure. The dynamic nature of the Cloud in comparison to the Grid makes it difficult because resources are provisioned on-demand unlike the Grid. This paper presents a novel approach that can assist in mitigating this challenge. This approach can collect Cloud infrastructure information along with workflow provenance and can establish a mapping between them. This mapping is later used to re-provision resources on the Cloud. The repeatability of the workflow execution is performed by: (a) capturing the Cloud infrastructure information (virtual machine configuration) along with the workflow provenance, and (b) re-provisioning the similar resources on the Cloud and re-executing the workflow on them. The evaluation of an initial prototype suggests that the proposed approach is feasible and can be investigated further.Comment: 6 pages; 5 figures; 3 tables in Proceedings of the Recomputability 2014 workshop of the 7th IEEE/ACM International Conference on Utility and Cloud Computing (UCC 2014). London December 201

arXiv.org e-Print Archive

Crossref

Distilling Structure in Scientific Workflows

Author: Chen Jiuqiang
Cohen-Boulakia Sarah
Froidevaux Christine
Goble Carole
Williams Alan
Publication venue: EMBnet.journal
Publication date: 01/01/2012
Field of study

International audienceIn this work, we have conducted a series of experiments to better understand the structure of scientific workflows. In particular, we have investigated techniques to understand why scientific workflows may or may not have a series-parallel structure

HAL-CentraleSupelec

CiteSeerX

INRIA a CCSD electronic archive server

The University of Manchester - Institutional Repository

HAL-Polytechnique

Reproducibility of scientific workflows execution using cloud-aware provenance (ReCAP)

Author: C Scheidegger
E Deelman
EHBM Gronenschild
G Juve
Ilkay Altintas
J Kim
Johannes Starlinger
K Munir
K Munir
Kamran Munir
Kamran Munir
Khawar Hasham
R Sakellariou
T Glatard
W Stallings
Y Simmhan
YL Simmhan
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/12/2018
Field of study

© 2018, Springer-Verlag GmbH Austria, part of Springer Nature. Provenance of scientific workflows has been considered a mean to provide workflow reproducibility. However, the provenance approaches adopted so far are not applicable in the context of Cloud because the provenance trace lacks the Cloud information. This paper presents a novel approach that collects the Cloud-aware provenance and represents it as a graph. The workflow execution reproducibility on the Cloud is determined by comparing the workflow provenance at three levels i.e., workflow structure, execution infrastructure and workflow outputs. The experimental evaluation shows that the implemented approach can detect changes in the provenance traces and the outputs produced by the workflow

Crossref

UWE Bristol Research Repository