1,374 research outputs found
A provenance-based semantic approach to support understandability, reproducibility, and reuse of scientific experiments
Understandability and reproducibility of scientific results are vital in every field of science. Several reproducibility measures are being taken to make the data used in the publications findable and accessible. However, there are many challenges faced by scientists from the beginning of an experiment to the end in particular for data management. The explosive growth of heterogeneous research data and understanding how this data has been derived is one of the research problems faced in this context. Interlinking the data, the steps and the results from the computational and non-computational processes of a scientific experiment is important for the reproducibility. We introduce the notion of end-to-end provenance management'' of scientific experiments to help scientists understand and reproduce the experimental results. The main contributions of this thesis are: (1) We propose a provenance modelREPRODUCE-ME'' to describe the scientific experiments using semantic web technologies by extending existing standards. (2) We study computational reproducibility and important aspects required to achieve it. (3) Taking into account the REPRODUCE-ME provenance model and the study on computational reproducibility, we introduce our tool, ProvBook, which is designed and developed to demonstrate computational reproducibility. It provides features to capture and store provenance of Jupyter notebooks and helps scientists to compare and track their results of different executions. (4) We provide a framework, CAESAR (CollAborative Environment for Scientific Analysis with Reproducibility) for the end-to-end provenance management. This collaborative framework allows scientists to capture, manage, query and visualize the complete path of a scientific experiment consisting of computational and non-computational steps in an interoperable way. We apply our contributions to a set of scientific experiments in microscopy research projects
Recommended from our members
Grid-based semantic integration of heterogeneous data resources: Implementation on a HealthGrid
This thesis was submitted for the degree of Doctor of Philosophy and was awarded by Brunel University.The semantic integration of geographically distributed and heterogeneous data
resources still remains a key challenge in Grid infrastructures. Today's
mainstream Grid technologies hold the promise to meet this challenge in a
systematic manner, making data applications more scalable and manageable. The
thesis conducts a thorough investigation of the problem, the state of the art, and
the related technologies, and proposes an Architecture for Semantic Integration of
Data Sources (ASIDS) addressing the semantic heterogeneity issue. It defines a
simple mechanism for the interoperability of heterogeneous data sources in order
to extract or discover information regardless of their different semantics. The
constituent technologies of this architecture include Globus Toolkit (GT4) and
OGSA-DAI (Open Grid Service Architecture Data Integration and Access)
alongside other web services technologies such as XML (Extensive Markup
Language). To show this, the ASIDS architecture was implemented and tested in a
realistic setting by building an exemplar application prototype on a HealthGrid
(pilot implementation).
The study followed an empirical research methodology and was informed by
extensive literature surveys and a critical analysis of the relevant technologies and
their synergies. The two literature reviews, together with the analysis of the
technology background, have provided a good overview of the current Grid and
HealthGrid landscape, produced some valuable taxonomies, explored new paths
by integrating technologies, and more importantly illuminated the problem and
guided the research process towards a promising solution. Yet the primary
contribution of this research is an approach that uses contemporary Grid
technologies for integrating heterogeneous data resources that have semantically
different. data fields (attributes). It has been practically demonstrated (using a
prototype HealthGrid) that discovery in semantically integrated distributed data
sources can be feasible by using mainstream Grid technologies, which have been
shown to have some Significant advantages over non-Grid based approaches
- …