90 research outputs found

    Provenance in scientific workflow systems

    Get PDF
    Journal ArticleThe automated tracking and storage of provenance information promises to be a major advantage of scientific workflow systems. We discuss issues related to data and workflow provenance, and present techniques for focusing user attention on meaningful provenance through "user views," for managing the provenance of nested scientific data, and for using information about the evolution of a workflow specification to understand the difference in the provenance of similar data products

    Re-thinking Workflow Provenance against Data-Oriented Investigation Lifecycle

    Get PDF

    Workflow Provenance: from Modeling to Reporting

    Get PDF
    Workflow provenance is a crucial part of a workflow system as it enables data lineage analysis, error tracking, workflow monitoring, usage pattern discovery, and so on. Integrating provenance into a workflow system or modifying a workflow system to capture or analyze different provenance information is burdensome, requiring extensive development because provenance mechanisms rely heavily on the modelling, architecture, and design of the workflow system. Various tools and technologies exist for logging events in a software system. Unfortunately, logging tools and technologies are not designed for capturing and analyzing provenance information. Workflow provenance is not only about logging, but also about retrieving workflow related information from logs. In this work, we propose a taxonomy of provenance questions and guided by these questions, we created a workflow programming model 'ProvMod' with a supporting run-time library to provide automated provenance and log analysis for any workflow system. The design and provenance mechanism of ProvMod is based on recommendations from prominent research and is easy to integrate into any workflow system. ProvMod offers Neo4j graph database support to manage semi-structured heterogeneous JSON logs. The log structure is adaptable to any NoSQL technology. For each provenance question in our taxonomy, ProvMod provides the answer with data visualization using Neo4j and the ELK Stack. Besides analyzing performance from various angles, we demonstrate the ease of integration by integrating ProvMod with Apache Taverna and evaluate ProvMod usability by engaging users. Finally, we present two Software Engineering research cases (clone detection and architecture extraction) where our proposed model ProvMod and provenance questions taxonomy can be applied to discover meaningful insights

    An optimized workflow enactor for data-intensive grid applications

    Get PDF
    I3S laboratory Research Report (I3S/RR-2005-32-FR), Sophia Antipolis, FranceData-intensive applications benefit from an intrinsic data parallelism that should be exploited on parallel systems to lower execution time. In the last years, data grids have been developed to handle, process, and analyze the tremendous amount of data produced in many scientific areas. Although very large, these grid infrastructures are under heavy use and efficiency is of utmost importance. This paper deals with the optimization of workflow managers used for deploying complex data-driven applications on grids. In that kind of application, we show how to better exploit data parallelism than currently done in most existing workflow managers. We present the design of a prototype implementing our solution and we show that it provides a significant speed-up w.r.t existing solutions by exemplifying results on a realistic medical imaging application
    • …
    corecore