126 research outputs found

    Perspectives on automated composition of workflows in the life sciences [version 1; peer review: 2 approved]

    Get PDF
    Scientific data analyses often combine several computational tools in automated pipelines, or workflows. Thousands of such workflows have been used in the life sciences, though their composition has remained a cumbersome manual process due to a lack of standards for annotation, assembly, and implementation. Recent technological advances have returned the long-standing vision of automated workflow composition into focus. This article summarizes a recent Lorentz Center workshop dedicated to automated composition of workflows in the life sciences. We survey previous initiatives to automate the composition process, and discuss the current state of the art and future perspectives. We start by drawing the “big picture” of the scientific workflow development life cycle, before surveying and discussing current methods, technologies and practices for semantic domain modelling, automation in workflow development, and workflow assessment. Finally, we derive a roadmap of individual and community-based actions to work toward the vision of automated workflow development in the forthcoming years. A central outcome of the workshop is a general description of the workflow life cycle in six stages: 1) scientific question or hypothesis, 2) conceptual workflow, 3) abstract workflow, 4) concrete workflow, 5) production workflow, and 6) scientific results. The transitions between stages are facilitated by diverse tools and methods, usually incorporating domain knowledge in some form. Formal semantic domain modelling is hard and often a bottleneck for the application of semantic technologies. However, life science communities have made considerable progress here in recent years and are continuously improving, renewing interest in the application of semantic technologies for workflow exploration, composition and instantiation. Combined with systematic benchmarking with reference data and large-scale deployment of production-stage workflows, such technologies enable a more systematic process of workflow development than we know today. We believe that this can lead to more robust, reusable, and sustainable workflows in the future.Stian Soiland-Reyes was supported by BioExcel-2 Centre of Excellence, funded by European Commission Horizon 2020 programme under European Commission contract H2020-INFRAEDI-02-2018 823830. Carole Goble was supported by EOSC-Life, funded by European Commission Horizon 2020 programme under grant agreement H2020-INFRAEOSC-2018-2 824087. We gratefully acknowledge the financial support from the Lorentz Center, ELIXIR, and the Leiden University Medical Center (LUMC) that made the workshop possible. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscriptPeer Reviewed"Article signat per 33 autors/es: Anna-Lena Lamprecht , Magnus Palmblad, Jon Ison, Veit Schwämmle , Mohammad Sadnan Al Manir, Ilkay Altintas, Christopher J. O. Baker, Ammar Ben Hadj Amor, Salvador Capella-Gutierrez, Paulos Charonyktakis, Michael R. Crusoe, Yolanda Gil, Carole Goble, Timothy J. Griffin , Paul Groth , Hans Ienasescu, Pratik Jagtap, Matúš Kalaš , Vedran Kasalica, Alireza Khanteymoori , Tobias Kuhn12, Hailiang Mei, Hervé Ménager, Steffen Möller, Robin A. Richardson, Vincent Robert9, Stian Soiland-Reyes, Robert Stevens, Szoke Szaniszlo, Suzan Verberne, Aswin Verhoeven, Katherine Wolstencroft "Postprint (published version

    Scientific Workflow Integration For Services Computing

    Get PDF
    In recent years, significant scientific advances are increasingly achieved through complex scientific processes. As the exponential growth in computing technologies and scientific data, a scientific workflow may comprise a large number of heterogeneous scientific services and applications, provided by different organizations. These services, applications, and their associated data are usually distributed across heterogeneous computing environments. The integration and management of such scientific workflows are pushing the limits of current workflow technology. This dissertation presents an integrated solution to composing, scheduling, executing and developing scientific workflows and scientific workflow management systems. To provide a foundation for workflow composition, scheduling, execution and management, we propose the first reference architecture for scientific workflow management systems. The reference architecture not only provides a high-level organization of subsystems and their interactions in a workflow system, but also provides a basis for comparison between different systems and a guidance for the architectural design of an SWFMS in a specific scientific domain. To integrate heterogeneous services and applications and enable them composed to workflows, we propose a task template model which not only provides an appropriate abstraction of heterogeneous services and applications, but also encapsulates the composition and mapping of shims and functional task components within a task interface. Our proposed task specification language (TSL) not only integrates heterogeneous services and applications into uniform workflow tasks, but also provides a solution to address both TYPE-I and TYPE-II shimming problems in composing scientific workflows. To schedule scientific workflows in emerging services computing environments, we propose two workflow scheduling algorithms, the SHEFT algorithm and the SCPOR algorithm, to prioritize tasks in a workflow, map tasks onto suitable resources and order the execution of tasks on the assigned resources, so that the workflow makespan can be minimized. Our extensive experiments have shown that our proposed algorithms not only outperform other algorithms for large-scale, data-intensive and compute intensive workflows, but also allow the assigned resources elastically change on demand of the size of workflows. To execute workflows on distributed computing environments, we propose a task run model to model the run-time behaviors of tasks. The proposed task run description language (TRDL) enables the execution of task instances constructed from heterogeneous services and applications. We also develop an SOA based task management subsystem to manage all task templates, task instances and task runs for the invocation and execution of various heterogeneous task components. Finally, our developed SOA based workflow management system, the VIEW system, and a VIEW based workflow application system, the FiberFlow system, validate our architectures, models, languages, and algorithms

    APE in the wild: automated exploration of proteomics workflows in the bio.tools registry

    Get PDF
    The bio.tools registry is a main catalogue of computational tools in the life sciences. More than 17 000 tools have been registered by the international bioinformatics community. The bio.tools metadata schema includes semantic annotations of tool functions, that is, formal descriptions of tools' data types, formats, and operations with terms from the EDAM bioinformatics ontology. Such annotations enable the automated composition of tools into multistep pipelines or workflows. In this Technical Note, we revisit a previous case study on the automated composition of proteomics workflows. We use the same four workflow scenarios but instead of using a small set of tools with carefully handcrafted annotations, we explore workflows directly on bio.tools. We use the Automated Pipeline Explorer (APE), a reimplementation and extension of the workflow composition method previously used. Moving "into the wild" opens up an unprecedented wealth of tools and a huge number of alternative workflows. Automated composition tools can be used to explore this space of possibilities systematically. Inevitably, the mixed quality of semantic annotations in bio.tools leads to unintended or erroneous tool combinations. However, our results also show that additional control mechanisms (tool filters, configuration options, and workflow constraints) can effectively guide the exploration toward smaller sets of more meaningful workflows.Proteomic

    VPH-HF: A software framework for the execution of complex subject-specific physiology modelling workflows

    Get PDF
    Computational medicine more and more requires complex orchestrations of multiple modelling & simulation codes, written in different programming languages and with different computational requirements, which when validated need to be run many times on large cohorts of patients. The aim of this paper is to present a new open source software, the VPH Hypermodelling Framework (VPH-HF). The VPH-HF overcomes the limitations of most workflow execution environments by supporting both Taverna and Muscle2; the addition of Muscle2 support makes possible the execution of very complex orchestrations that include strongly-coupled models. The overhead that the VPH-HF imposes in exchange for this is small, and tends to be flat regardless of the complexity and the computational cost of the hypermodel being executed. We recommend the use of the VPH-HF to orchestrate any hypermodel with an execution time of 200 s or higher, which would confine the VPH-HF overhead to less than 10%. The VPH-HF also provide an automatic caching system over the execution of every hypomodel, which may provide considerable speed-up when the orchestration is run repeatedly over large numbers of patients or within stochastic frameworks, and the input sets are properly binned. The caching system also makes it easy to form large input set/output set databases required to develop reduced-order models, and the framework offers the possibility to dynamically replace single models in the orchestration with reduced-order versions built from cached results, an essential feature when the orchestration of multiple models produces a combinatory explosion of the computational cost

    Automatic Generation of Optimized Process Models from Declarative Specifications

    Get PDF
    Process models often are generic, i. e., describe similar cases or contexts. For instance, a process model for commissioning can cover both vehicles with an automatic and with a manual transmission, by executing alternative tasks. A generic process model is not optimal compared to one tailored to a specific context. Given a declarative specification of the constraints and a specific context, we study how to automatically generate a good process model and propose a novel approach. We focus on the restricted case that there are not any repetitions of a task, as is the case in commissioning and elsewhere, e. g., manufacturing. Our approach uses a probabilistic search to find a good process model according to quality criteria. It can handle complex real-world specifications containing several hundred constraints and more than one hundred tasks. The process models generated with our scheme are superior (nearly twice as fast) to ones designed by professional modelers by hand
    corecore