6,055 research outputs found

    Workflow Provenance: from Modeling to Reporting

    Get PDF
    Workflow provenance is a crucial part of a workflow system as it enables data lineage analysis, error tracking, workflow monitoring, usage pattern discovery, and so on. Integrating provenance into a workflow system or modifying a workflow system to capture or analyze different provenance information is burdensome, requiring extensive development because provenance mechanisms rely heavily on the modelling, architecture, and design of the workflow system. Various tools and technologies exist for logging events in a software system. Unfortunately, logging tools and technologies are not designed for capturing and analyzing provenance information. Workflow provenance is not only about logging, but also about retrieving workflow related information from logs. In this work, we propose a taxonomy of provenance questions and guided by these questions, we created a workflow programming model 'ProvMod' with a supporting run-time library to provide automated provenance and log analysis for any workflow system. The design and provenance mechanism of ProvMod is based on recommendations from prominent research and is easy to integrate into any workflow system. ProvMod offers Neo4j graph database support to manage semi-structured heterogeneous JSON logs. The log structure is adaptable to any NoSQL technology. For each provenance question in our taxonomy, ProvMod provides the answer with data visualization using Neo4j and the ELK Stack. Besides analyzing performance from various angles, we demonstrate the ease of integration by integrating ProvMod with Apache Taverna and evaluate ProvMod usability by engaging users. Finally, we present two Software Engineering research cases (clone detection and architecture extraction) where our proposed model ProvMod and provenance questions taxonomy can be applied to discover meaningful insights

    Enabling dynamic and intelligent workflows for HPC, data analytics, and AI convergence

    Get PDF
    The evolution of High-Performance Computing (HPC) platforms enables the design and execution of progressively larger and more complex workflow applications in these systems. The complexity comes not only from the number of elements that compose the workflows but also from the type of computations they perform. While traditional HPC workflows target simulations and modelling of physical phenomena, current needs require in addition data analytics (DA) and artificial intelligence (AI) tasks. However, the development of these workflows is hampered by the lack of proper programming models and environments that support the integration of HPC, DA, and AI, as well as the lack of tools to easily deploy and execute the workflows in HPC systems. To progress in this direction, this paper presents use cases where complex workflows are required and investigates the main issues to be addressed for the HPC/DA/AI convergence. Based on this study, the paper identifies the challenges of a new workflow platform to manage complex workflows. Finally, it proposes a development approach for such a workflow platform addressing these challenges in two directions: first, by defining a software stack that provides the functionalities to manage these complex workflows; and second, by proposing the HPC Workflow as a Service (HPCWaaS) paradigm, which leverages the software stack to facilitate the reusability of complex workflows in federated HPC infrastructures. Proposals presented in this work are subject to study and development as part of the EuroHPC eFlows4HPC project.This work has received funding from the European High-Performance Computing Joint Undertaking (JU) under grant agreement No 955558. The JU receives support from the European Union’s Horizon 2020 research and innovation programme and Spain, Germany, France, Italy, Poland, Switzerland and Norway. In Spain, it has received complementary funding from MCIN/AEI/10.13039/501100011033, Spain and the European Union NextGenerationEU/PRTR (contracts PCI2021-121957, PCI2021-121931, PCI2021-121944, and PCI2021-121927). In Germany, it has received complementary funding from the German Federal Ministry of Education and Research (contracts 16HPC016K, 6GPC016K, 16HPC017 and 16HPC018). In France, it has received financial support from Caisse des dĂ©pĂŽts et consignations (CDC) under the action PIA ADEIP (project Calculateurs). In Italy, it has been preliminary approved for complimentary funding by Ministero dello Sviluppo Economico (MiSE) (ref. project prop. 2659). In Norway, it has received complementary funding from the Norwegian Research Council, Norway under project number 323825. In Switzerland, it has been preliminary approved for complimentary funding by the State Secretariat for Education, Research, and Innovation (SERI), Norway. In Poland, it is partially supported by the National Centre for Research and Development under decision DWM/EuroHPCJU/4/2021. The authors also acknowledge financial support by MCIN/AEI /10.13039/501100011033, Spain through the “Severo Ochoa Programme for Centres of Excellence in R&D” under Grant CEX2018-000797-S, the Spanish Government, Spain (contract PID2019-107255 GB) and by Generalitat de Catalunya, Spain (contract 2017-SGR-01414). Anna Queralt is a Serra HĂșnter Fellow.With funding from the Spanish government through the ‘Severo Ochoa Centre of Excellence’ accreditation (CEX2018-000797-S)

    Reference Exascale Architecture (Extended Version)

    Get PDF
    While political commitments for building exascale systems have been made, turning these systems into platforms for a wide range of exascale applications faces several technical, organisational and skills-related challenges. The key technical challenges are related to the availability of data. While the first exascale machines are likely to be built within a single site, the input data is in many cases impossible to store within a single site. Alongside handling of extreme-large amount of data, the exascale system has to process data from different sources, support accelerated computing, handle high volume of requests per day, minimize the size of data flows, and be extensible in terms of continuously increasing data as well as an increase in parallel requests being sent. These technical challenges are addressed by the general reference exascale architecture. It is divided into three main blocks: virtualization layer, distributed virtual file system, and manager of computing resources. Its main property is modularity which is achieved by containerization at two levels: 1) application containers - containerization of scientific workflows, 2) micro-infrastructure - containerization of extreme-large data service-oriented infrastructure. The paper also presents an instantiation of the reference architecture - the architecture of the PROCESS project (PROviding Computing solutions for ExaScale ChallengeS) and discusses its relation to the reference exascale architecture. The PROCESS architecture has been used as an exascale platform within various exascale pilot applications. This paper also presents performance modelling of exascale platform with its validation

    Elastic Business Process Management: State of the Art and Open Challenges for BPM in the Cloud

    Full text link
    With the advent of cloud computing, organizations are nowadays able to react rapidly to changing demands for computational resources. Not only individual applications can be hosted on virtual cloud infrastructures, but also complete business processes. This allows the realization of so-called elastic processes, i.e., processes which are carried out using elastic cloud resources. Despite the manifold benefits of elastic processes, there is still a lack of solutions supporting them. In this paper, we identify the state of the art of elastic Business Process Management with a focus on infrastructural challenges. We conceptualize an architecture for an elastic Business Process Management System and discuss existing work on scheduling, resource allocation, monitoring, decentralized coordination, and state management for elastic processes. Furthermore, we present two representative elastic Business Process Management Systems which are intended to counter these challenges. Based on our findings, we identify open issues and outline possible research directions for the realization of elastic processes and elastic Business Process Management.Comment: Please cite as: S. Schulte, C. Janiesch, S. Venugopal, I. Weber, and P. Hoenisch (2015). Elastic Business Process Management: State of the Art and Open Challenges for BPM in the Cloud. Future Generation Computer Systems, Volume NN, Number N, NN-NN., http://dx.doi.org/10.1016/j.future.2014.09.00

    AVENTIS - An architecture for event data analysis

    Full text link
    Time-stamped event data is being generated at an exponential rate from various sources (sensor networks, e-markets etc.), which are stored in event logs and made available to researchers. Despite the data deluge and evolution of a plethora of tools and technologies, science behind exploratory analysis and knowledge discovery lags. There are several reasons behind this. In conducting event data analysis, researchers typically detect a pattern or trend in the data through computation of time-series measures and apply the computed measures to several mathematical models to glean information from data. This is a complex and time-consuming process covering a range of activities from data capture (from a broad array of data sources) to interpretation and dissemination of experimental results forming a pipeline of activities. Further, data-analysis is conducted by domain-users, who are typically non-IT experts but data processing tools and applications are largely developed by application developers. End-users not only lack the critical skills to build a structured analysis pipeline, but are also perplexed by the number of different ways available to derive the necessary information. Consequently, this thesis proposes AVENTIS (Architecture for eVENT Data analysIS), a novel framework to guide the design of analytic solutions to facilitate time-series analysis of event data and is tailored to the needs of domain users. The framework comprises three components; a knowledge base, a model-driven analytic methodology and an accompanying software architecture that provides the necessary technical and operational requirements. Specifically, the research contribution lies in the ability of the framework to enable expressing analysis requirements at a level of abstraction consistent with the domain users and readily make available the information sought without the users having to build the analysis process themselves. Secondly, the framework also facilitates an abstract design space for the domain experts to enable them to build conceptual models of their experiment as a sequence of structured tasks in a technology neutral manner and transparently translate these abstract process models to executable implementations. To evaluate the AVENTIS framework, a prototype based on AVENTIS is implemented and tested with case studies taken from the financial research domain

    Distributed Software Development Tools for Distributed Scientific Applications

    Get PDF
    This chapter provides a new methodology and two tools for user‐driven Wikinomics‐oriented scientific applications’ development. Service‐oriented architecture for such applications is used, where the entire research supporting computing or simulating process is broken down into a set of loosely coupled stages in the form of interoperating replaceable Web services that can be distributed over different clouds. Any piece of the code and any application component deployed on a system can be reused and transformed into a service. The combination of service‐oriented and cloud computing will indeed begin to challenge the way of research supporting computing development, the facilities of which are considered in this chapter

    A novel workflow management system for handling dynamic process adaptation and compliance

    Get PDF
    Modern enterprise organisations rely on dynamic processes. Generally these processes cannot be modelled once and executed repeatedly without change. Enterprise processes may evolve unpredictably according to situations that cannot always be prescribed. However, no mechanism exists to ensure an updated process does not violate any compliance requirements. Typical workflow processes may follow a process definition and execute several thousand instances using a workflow engine without any changes. This is suitable for routine business processes. However, when business processes need flexibility, adaptive features are needed. Updating processes may violate compliance requirements so automatic verification of compliance checking is necessary. The research work presented in this Thesis investigates the problem of current workflow technology in defining, managing and ensuring the specification and execution of business processes that are dynamic in nature, combined with policy standards throughout the process lifycle. The findings from the literature review and the system requirements are used to design the proposed system architecture. Since a two-tier reference process model is not sufficient as a basis for the reference model for an adaptive and compliance workflow management system, a three-tier process model is proposed. The major components of the architecture consist of process models, business rules and plugin modules. This architecture exhibits the concept of user adaptation with structural checks and dynamic adaptation with data-driven checks. A research prototype - Adaptive and Compliance Workflow Management System (ACWfMS) - was developed based on the proposed system architecture to implement core services of the system for testing and evaluation purposes. The ACWfMS enables the development of a workflow management tool to create or update the process models. It automatically validates compliance requirements and, in the case of violations, visual feedback is presented to the user. In addition, the architecture facilitates process migration to manage specific instances with modified definitions. A case study based on the postgraduate research process domain is discussed
    • 

    corecore