226 research outputs found

    Achieving Reproducibility Incorporating Service Versioning into Provenance Model

    Get PDF
    Reproducibility has long been a cornerstone of science. Underpinning reproducibility is provenance, which has the potential to provide scientists with a complete understanding of data generated in e-experiments, including the services that were produced and consumed. This paper explores the issues of service versioning in provenance to achieve reproducibility. Current provenance model does not directly support service versioning. Therefore, this paper introduces an enhancement of a provenance model to incorporate service versioning mechanism that provides a way to access multiple versions of the same service so that researcher can compare one version to another, and understand their effects on processing data. The enhanced provenance model is able to track the changes of the same service (versions of the same service) over time and correlates versioned services with the results they generate

    The exploitation of provenance and versioning in the reproduction of e-experiments

    Get PDF
    PhD ThesisReproducibility has long been a cornerstone of science, and is now becoming a key research area for e-Science. This is because it provides a way to validate, and build on, previous results. Underpinning reproducibility in e-Science is provenance, which has the potential to provide scientists with a complete understanding of data generated in eexperiments, including the services that produced and consumed it. This thesis explores the issues in exploiting provenance for reproducibility. Based on this, a reproducibility framework is designed and implemented to allow past experiments to be reproduced. Seven aspects of reproducibility are considered: 1) experiments, 2) reproducibility, 3) provenance, 4) provenance models, 5) provenance and versioning, 6) automatic transformation of provenance to support reproduction, and 7) a reproducibility taxonomy. A key to reproducibility is the provenance model: a data model that structures information about an e-experiment. A review of existing provenance systems shows that the problem caused by services being updated has been neglected. This can have a severe impact on the ability to reproduce experiments and it is therefore argued that the issue of service versioning must be addressed. Even after information on the provenance of an execution, and versioning of services, is captured there is the need for a method to transform this knowledge into a form that allows past experiments to be reproduced: that is another output of this thesis. The thesis focuses on the use of work ow as a means to represent the composition, and to execute experiments. This work explores how work ows can be automatically generated to re-execute past experiments. In order to do this, a transformation algorithm is described that maps a past experiment's execution log data into a work ow format that can be read and processed by the work- ow system. The thesis also introduces a Reproducibility Taxonomy that captures and structures the information required for reproducibility in the presence of versions and provenance.my employer, the Universiti Malaysia Sarawak (UNIMAS) and the Ministry of Higher Education Malaysia for being my sponsor and supporter throughout my PhD study

    LIFEDATA - a framework for traceable active learning projects

    Get PDF
    Active Learning has become a popular method for iteratively improving data-intensive Artificial Intelligence models. However, it often presents a significant challenge when dealing with large volumes of volatile data in projects, as with an Active Learning loop. This paper introduces LIFEDATA, a Python- based framework designed to assist developers in implementing Active Learning projects focusing on traceability. It supports seamless tracking of all artifacts, from data selection and labeling to model interpretation, thus promoting transparency throughout the entire model learning process and enhancing error debugging efficiency while ensuring experiment reproducibility. To showcase its applicability, we present two life science use cases. Moreover, the paper proposes an algorithm that combines query strategies to demonstrate LIFEDATA’s ability to reduce data labeling effort

    Classification of Scientific Workflows Based on Reproducibility Analysis

    Get PDF

    Reproducibility of scientific workflows execution using cloud-aware provenance (ReCAP)

    Get PDF
    © 2018, Springer-Verlag GmbH Austria, part of Springer Nature. Provenance of scientific workflows has been considered a mean to provide workflow reproducibility. However, the provenance approaches adopted so far are not applicable in the context of Cloud because the provenance trace lacks the Cloud information. This paper presents a novel approach that collects the Cloud-aware provenance and represents it as a graph. The workflow execution reproducibility on the Cloud is determined by comparing the workflow provenance at three levels i.e., workflow structure, execution infrastructure and workflow outputs. The experimental evaluation shows that the implemented approach can detect changes in the provenance traces and the outputs produced by the workflow

    16th SC@RUG 2019 proceedings 2018-2019

    Get PDF

    16th SC@RUG 2019 proceedings 2018-2019

    Get PDF
    • …
    corecore