8,319 research outputs found

    Dynamic execution of scientific workflows in cloud

    Get PDF

    Scientific workflow execution reproducibility using cloud-aware provenance

    Get PDF
    Scientific experiments and projects such as CMS and neuGRIDforYou (N4U) are annually producing data of the order of Peta-Bytes. They adopt scientific workflows to analyse this large amount of data in order to extract meaningful information. These workflows are executed over distributed resources, both compute and storage in nature, provided by the Grid and recently by the Cloud. The Cloud is becoming the playing field for scientists as it provides scalability and on-demand resource provisioning. Reproducing a workflow execution to verify results is vital for scientists and have proven to be a challenge. As per a study (Belhajjame et al. 2012) around 80% of workflows cannot be reproduced, and 12% of them are due to the lack of information about the execution environment. The dynamic and on-demand provisioning capability of the Cloud makes this more challenging. To overcome these challenges, this research aims to investigate how to capture the execution provenance of a scientific workflow along with the resources used to execute the workflow in a Cloud infrastructure. This information will then enable a scientist to reproduce workflow-based scientific experiments on the Cloud infrastructure by re-provisioning the similar resources on the Cloud.Provenance has been recognised as information that helps in debugging, verifying and reproducing a scientific workflow execution. Recent adoption of Cloud-based scientific workflows presents an opportunity to investigate the suitability of existing approaches or to propose new approaches to collect provenance information from the Cloud and to utilize it for workflow reproducibility on the Cloud. From literature analysis, it was found that the existing approaches for Grid or Cloud do not provide detailed resource information and also do not present an automatic provenance capturing approach for the Cloud environment. To mitigate the challenges and fulfil the knowledge gap, a provenance based approach, ReCAP, has been proposed in this thesis. In ReCAP, workflow execution reproducibility is achieved by (a) capturing the Cloud-aware provenance (CAP), b) re-provisioning similar resources on the Cloud and re-executing the workflow on them and (c) by comparing the provenance graph structure including the Cloud resource information, and outputs of workflows. ReCAP captures the Cloud resource information and links it with the workflow provenance to generate Cloud-aware provenance. The Cloud-aware provenance consists of configuration parameters relating to hardware and software describing a resource on the Cloud. This information once captured aids in re-provisioning the same execution infrastructure on the Cloud for workflow re-execution. Since resources on the Cloud can be used in static or dynamic (i.e. destroyed when a task is finished) manner, this presents a challenge for the devised provenance capturing approach. In order to deal with these scenarios, different capturing and mapping approaches have been presented in this thesis. These mapping approaches work outside the virtual machine and collect resource information from the Cloud middleware, thus they do not affect job performance. The impact of the collected Cloud resource information on the job as well as on the workflow execution has been evaluated through various experiments in this thesis. In ReCAP, the workflow reproducibility isverified by comparing the provenance graph structure, infrastructure details and the output produced by the workflows. To compare the provenance graphs, the captured provenance information including infrastructure details is translated to a graph model. These graphs of original execution and the reproduced execution are then compared in order to analyse their similarity. In this regard, two comparison approaches have been presented that can produce a qualitative analysis as well as quantitative analysis about the graph structure. The ReCAP framework and its constituent components are evaluated using different scientific workflows such as ReconAll and Montage from the domains of neuroscience (i.e. N4U) and astronomy respectively. The results have shown that ReCAP has been able to capture the Cloud-aware provenance and demonstrate the workflow execution reproducibility by re-provisioning the same resources on the Cloud. The results have also demonstrated that the provenance comparison approaches can determine the similarity between the two given provenance graphs. The results of workflow output comparison have shown that this approach is suitable to compare the outputs of scientific workflows, especially for deterministic workflows

    Scientific Workflow Repeatability through Cloud-Aware Provenance

    Full text link
    The transformations, analyses and interpretations of data in scientific workflows are vital for the repeatability and reliability of scientific workflows. This provenance of scientific workflows has been effectively carried out in Grid based scientific workflow systems. However, recent adoption of Cloud-based scientific workflows present an opportunity to investigate the suitability of existing approaches or propose new approaches to collect provenance information from the Cloud and to utilize it for workflow repeatability in the Cloud infrastructure. The dynamic nature of the Cloud in comparison to the Grid makes it difficult because resources are provisioned on-demand unlike the Grid. This paper presents a novel approach that can assist in mitigating this challenge. This approach can collect Cloud infrastructure information along with workflow provenance and can establish a mapping between them. This mapping is later used to re-provision resources on the Cloud. The repeatability of the workflow execution is performed by: (a) capturing the Cloud infrastructure information (virtual machine configuration) along with the workflow provenance, and (b) re-provisioning the similar resources on the Cloud and re-executing the workflow on them. The evaluation of an initial prototype suggests that the proposed approach is feasible and can be investigated further.Comment: 6 pages; 5 figures; 3 tables in Proceedings of the Recomputability 2014 workshop of the 7th IEEE/ACM International Conference on Utility and Cloud Computing (UCC 2014). London December 201

    Real-time and dynamic fault-tolerant scheduling for scientific workflows in clouds

    Get PDF
    Cloud computing has become a popular technology for executing scientific workflows. However, with a large number of hosts and virtual machines (VMs) being deployed, the cloud resource failures, such as the permanent failure of hosts (HPF), the transient failure of hosts (HTF), and the transient failure of VMs (VMTF), bring the service reliability problem. Therefore, fault tolerance for time-consuming scientific workflows is highly essential in the cloud. However, existing fault-tolerant (FT) approaches consider only one or two above failure types and easily neglect the others, especially for the HTF. This paper proposes a Real-time and dynamic Fault-tolerant Scheduling (ReadyFS) algorithm for scientific workflow execution in a cloud, which guarantees deadline constraints and improves resource utilization even in the presence of any resource failure. Specifically, we first introduce two FT mechanisms, i.e., the replication with delay execution (RDE) and the checkpointing with delay execution (CDE), to cope with HPF and VMTF, simultaneously. Additionally, the rescheduling (ReSC) is devised to tackle the HTF that affects the resource availability of the entire cloud datacenter. Then, the resource adjustment (RA) strategy, including the resource scaling-up (RS-Up) and the resource scaling-down (RS-Down), is used to adjust resource demands and improve resource utilization dynamically. Finally, the ReadyFS algorithm is presented to schedule real-time scientific workflows by combining all the above FT mechanisms with RA strategy. We conduct the performance evaluation with real-world scientific workflows and compare ReadyFS with five vertical comparison algorithms and three horizontal comparison algorithms. Simulation results confirm that ReadyFS is indeed able to guarantee the fault tolerance of scientific workflow execution and improve cloud resource utilization

    Data Placement And Task Mapping Optimization For Big Data Workflows In The Cloud

    Get PDF
    Data-centric workflows naturally process and analyze a huge volume of datasets. In this new era of Big Data there is a growing need to enable data-centric workflows to perform computations at a scale far exceeding a single workstation\u27s capabilities. Therefore, this type of applications can benefit from distributed high performance computing (HPC) infrastructures like cluster, grid or cloud computing. Although data-centric workflows have been applied extensively to structure complex scientific data analysis processes, they fail to address the big data challenges as well as leverage the capability of dynamic resource provisioning in the Cloud. The concept of “big data workflows” is proposed by our research group as the next generation of data-centric workflow technologies to address the limitations of exist-ing workflows technologies in addressing big data challenges. Executing big data workflows in the Cloud is a challenging problem as work-flow tasks and data are required to be partitioned, distributed and assigned to the cloud execution sites (multiple virtual machines). In running such big data work-flows in the cloud distributed across several physical locations, the workflow execution time and the cloud resource utilization efficiency highly depends on the initial placement and distribution of the workflow tasks and datasets across the multiple virtual machines in the Cloud. Several workflow management systems have been developed for scientists to facilitate the use of workflows; however, data and work-flow task placement issue has not been sufficiently addressed yet. In this dissertation, I propose BDAP strategy (Big Data Placement strategy) for data placement and TPS (Task Placement Strategy) for task placement, which improve workflow performance by minimizing data movement across multiple virtual machines in the Cloud during the workflow execution. In addition, I propose CATS (Cultural Algorithm Task Scheduling) for workflow scheduling, which improve workflow performance by minimizing workflow execution cost. In this dissertation, I 1) formalize data and task placement problems in workflows, 2) propose a data placement algorithm that considers both initial input dataset and intermediate datasets obtained during workflow run, 3) propose a task placement algorithm that considers placement of workflow tasks before workflow run, 4) propose a workflow scheduling strategy to minimize the workflow execution cost once the deadline is provided by user and 5)perform extensive experiments in the distributed environment to validate that our proposed strategies provide an effective data and task placement solution to distribute and place big datasets and tasks into the appropriate virtual machines in the Cloud within reasonable time
    corecore