4 research outputs found

    Scientific Workflow Repeatability through Cloud-Aware Provenance

    Full text link
    The transformations, analyses and interpretations of data in scientific workflows are vital for the repeatability and reliability of scientific workflows. This provenance of scientific workflows has been effectively carried out in Grid based scientific workflow systems. However, recent adoption of Cloud-based scientific workflows present an opportunity to investigate the suitability of existing approaches or propose new approaches to collect provenance information from the Cloud and to utilize it for workflow repeatability in the Cloud infrastructure. The dynamic nature of the Cloud in comparison to the Grid makes it difficult because resources are provisioned on-demand unlike the Grid. This paper presents a novel approach that can assist in mitigating this challenge. This approach can collect Cloud infrastructure information along with workflow provenance and can establish a mapping between them. This mapping is later used to re-provision resources on the Cloud. The repeatability of the workflow execution is performed by: (a) capturing the Cloud infrastructure information (virtual machine configuration) along with the workflow provenance, and (b) re-provisioning the similar resources on the Cloud and re-executing the workflow on them. The evaluation of an initial prototype suggests that the proposed approach is feasible and can be investigated further.Comment: 6 pages; 5 figures; 3 tables in Proceedings of the Recomputability 2014 workshop of the 7th IEEE/ACM International Conference on Utility and Cloud Computing (UCC 2014). London December 201

    NeuroProv - A visualisation system to enhance the utility of provenance Data for neuroimaging analysis

    Get PDF
    E-Science platforms such as myGRID and NeuGRID for Users are growing at an amazing rate. One of the key barriers to their widespread use in practice is the lack of provenance data to support the reasoning and verification of experimental or analysis results. Clinical researchers use workflows to orchestrate the data present in e-science platforms in order to facilitate processing. Even though most systems capture provenance data and store it, systems rarely make use of it, thus limiting the exploitation of the true potential of such provenance. This thesis investigates mechanisms to visualise provenance data for neuroimaging analysis and to provide means to exploit the true potential of provenance data. In order to achieve this, a visualisation system has been implemented based on the use-cases that have been designed following requirements elicited for neuroimaging analysis. In this research a technique has been used to address the requirements of provenance visualisation for neuroimaging analysis. The prototype system has been tested against the provenance generated by NeuGRID for Users (N4U) as a proof of concept for our research. Different workflows have been visualised to study the efficacy of the proposed solution. Furthermore, evaluation metrics have been defined to determine whether the proposed solution is suitable for the purpose of the research conducted. The results show that the proposed visualisation system enhances the utility of provenance data for neuroimaging analysis and therefore the proposed research can be used to provide value to provenance data for neuroimaging analyses

    HyDRA Hybrid workflow Design Recommender Architecture

    Get PDF
    Workflows are a way to describe a series of computations on raw e-Science data. These data may be MRI brain scans, data from a high energy physics detector or metric data from an earth observation project. In order to derive meaningful knowledge from the data, it must be processed and analysed. Workflows have emerged as the principle mechanism for describing and enacting complex e-Science analyses on distributed infrastructures such as grids. Scientific users face a number of challenges when designing workflows. These challenges include selecting appropriate components for their tasks, spec- ifying dependencies between them and selecting appropriate parameter values. These tasks become especially challenging as workflows become increasingly large. For example, the CIVET workflow consists of up to 108 components. Building the workflow by hand and specifying all the links can become quite cumbersome for scientific users.Traditionally, recommender systems have been employed to assist users in such time-consuming and tedious tasks. One of the techniques used by recommender systems has been to predict what the user is attempting to do using a variety of techniques. These techniques include using workflow se- mantics on the one hand and historical usage patterns on the other. Semantics-based systems attempt to infer a user’s intentions based on the available semantics. Pattern-based systems attempt to extract usage patterns from previously-constructed workflows and match those patterns to the workflow un- der construction. The use of historical patterns adds dynamism to the suggestions as the system can learn and adapt with “experience”. However, in cases where there are no previous patterns to draw upon, pattern-based systems fail to perform. Semantics-based systems, on the other hand infer from static information, so they always have something to draw upon. However, that information first has to be encoded into the semantic repository for the system to draw upon it, which is a time-consuming and tedious task in it self. Moreover, semantics-based systems do not learn and adapt with experience. Both approaches have distinct, but complementary features and drawbacks. By combining the two approaches, the drawbacks of each approach can be addressed.This thesis presents HyDRA, a novel hybrid framework that combines frequent usage patterns and workflow semantics to generate suggestions. The functions performed by the framework include; a) extracting frequent functional usage patterns; b) identifying the semantics of unknown components; and c) generating accurate and meaningful suggestions. Challenges to mining frequent patterns in- clude ensuring that meaningful and useful patterns are extracted. For this purpose only patterns that occur above a minimum frequency threshold are mined. Moreover, instead of just groups of specific components, the pattern mining algorithm takes into account workflow component semantics. This allows the system to identify different types of components that perform a single composite function. One of the challenges in maintaining a semantic repository is to keep the repository up-to-date. This involves identifying new items and inferring their semantics. In this regard, a minor contribution of this research is a semantic inference engine that is responsible for function b). This engine also uses pre-defined workflow component semantics to infer new semantic properties and generate more accurate suggestions. The overall suggestion generation algorithm is also presented.HyDRA has been evaluated using workflows from the Laboratory of Neuro Imaging (LONI) repos- itory. These workflows have been chosen for their structural and functional characteristics that help� to evaluate the framework in different scenarios. The system is also compared with another existing pattern-based system to show a clear improvement in the accuracy of the suggestions generated
    corecore