The transformations, analyses and interpretations of data in scientific
workflows are vital for the repeatability and reliability of scientific
workflows. This provenance of scientific workflows has been effectively carried
out in Grid based scientific workflow systems. However, recent adoption of
Cloud-based scientific workflows present an opportunity to investigate the
suitability of existing approaches or propose new approaches to collect
provenance information from the Cloud and to utilize it for workflow
repeatability in the Cloud infrastructure. The dynamic nature of the Cloud in
comparison to the Grid makes it difficult because resources are provisioned
on-demand unlike the Grid. This paper presents a novel approach that can assist
in mitigating this challenge. This approach can collect Cloud infrastructure
information along with workflow provenance and can establish a mapping between
them. This mapping is later used to re-provision resources on the Cloud. The
repeatability of the workflow execution is performed by: (a) capturing the
Cloud infrastructure information (virtual machine configuration) along with the
workflow provenance, and (b) re-provisioning the similar resources on the Cloud
and re-executing the workflow on them. The evaluation of an initial prototype
suggests that the proposed approach is feasible and can be investigated
further.Comment: 6 pages; 5 figures; 3 tables in Proceedings of the Recomputability
2014 workshop of the 7th IEEE/ACM International Conference on Utility and
Cloud Computing (UCC 2014). London December 201