2,346 research outputs found

    A MULTI-FUNCTIONAL PROVENANCE ARCHITECTURE: CHALLENGES AND SOLUTIONS

    Get PDF
    In service-oriented environments, services are put together in the form of a workflow with the aim of distributed problem solving. Capturing the execution details of the services' transformations is a significant advantage of using workflows. These execution details, referred to as provenance information, are usually traced automatically and stored in provenance stores. Provenance data contains the data recorded by a workflow engine during a workflow execution. It identifies what data is passed between services, which services are involved, and how results are eventually generated for particular sets of input values. Provenance information is of great importance and has found its way through areas in computer science such as: Bioinformatics, database, social, sensor networks, etc. Current exploitation and application of provenance data is very limited as provenance systems started being developed for specific applications. Thus, applying learning and knowledge discovery methods to provenance data can provide rich and useful information on workflows and services. Therefore, in this work, the challenges with workflows and services are studied to discover the possibilities and benefits of providing solutions by using provenance data. A multifunctional architecture is presented which addresses the workflow and service issues by exploiting provenance data. These challenges include workflow composition, abstract workflow selection, refinement, evaluation, and graph model extraction. The specific contribution of the proposed architecture is its novelty in providing a basis for taking advantage of the previous execution details of services and workflows along with artificial intelligence and knowledge management techniques to resolve the major challenges regarding workflows. The presented architecture is application-independent and could be deployed in any area. The requirements for such an architecture along with its building components are discussed. Furthermore, the responsibility of the components, related works and the implementation details of the architecture along with each component are presented

    Adaptive Process Management in Cyber-Physical Domains

    Get PDF
    The increasing application of process-oriented approaches in new challenging cyber-physical domains beyond business computing (e.g., personalized healthcare, emergency management, factories of the future, home automation, etc.) has led to reconsider the level of flexibility and support required to manage complex processes in such domains. A cyber-physical domain is characterized by the presence of a cyber-physical system coordinating heterogeneous ICT components (PCs, smartphones, sensors, actuators) and involving real world entities (humans, machines, agents, robots, etc.) that perform complex tasks in the “physical” real world to achieve a common goal. The physical world, however, is not entirely predictable, and processes enacted in cyber-physical domains must be robust to unexpected conditions and adaptable to unanticipated exceptions. This demands a more flexible approach in process design and enactment, recognizing that in real-world environments it is not adequate to assume that all possible recovery activities can be predefined for dealing with the exceptions that can ensue. In this chapter, we tackle the above issue and we propose a general approach, a concrete framework and a process management system implementation, called SmartPM, for automatically adapting processes enacted in cyber-physical domains in case of unanticipated exceptions and exogenous events. The adaptation mechanism provided by SmartPM is based on declarative task specifications, execution monitoring for detecting failures and context changes at run-time, and automated planning techniques to self-repair the running process, without requiring to predefine any specific adaptation policy or exception handler at design-time

    Scalable And Secure Provenance Querying For Scientific Workflows And Its Application In Autism Study

    Get PDF
    In the era of big data, scientific workflows have become essential to automate scientific experiments and guarantee repeatability. As both data and workflow increase in their scale, requirements for having a data lineage management system commensurate with the complexity of the workflow also become necessary, calling for new scalable storage, query, and analytics infrastructure. This system that manages and preserves the derivation history and morphosis of data, known as provenance system, is essential for maintaining quality and trustworthiness of data products and ensuring reproducibility of scientific discoveries. With a flurry of research and increased adoption of scientific workflows in processing sensitive data, i.e., health and medication domain, securing information flow and instrumenting access privileges in the system have become a fundamental precursor to deploying large-scale scientific workflows. That has become more important now since today team of scientists around the world can collaborate on experiments using globally distributed sensitive data sources. Hence, it has become imperative to augment scientific workflow systems as well as the underlying provenance management systems with data security protocols. Provenance systems, void of data security protocol, are susceptible to vulnerability. In this dissertation research, we delineate how scientific workflows can improve therapeutic practices in autism spectrum disorders. The data-intensive computation inherent in these workflows and sensitive nature of the data, necessitate support for scalable, parallel and robust provenance queries and secured view of data. With that in perspective, we propose OPQLPigOPQL^{Pig}, a parallel, robust, reliable and scalable provenance query language and introduce the concept of access privilege inheritance in the provenance systems. We characterize desirable properties of role-based access control protocol in scientific workflows and demonstrate how the qualities are integrated into the workflow provenance systems as well. Finally, we describe how these concepts fit within the DATAVIEW workflow management system

    Distributed Web Service Coordination for Collaboration Applications and Biological Workflows

    Get PDF
    In this dissertation work, we have investigated the main research thrust of decentralized coordination of workflows over web services. To address distributed workflow coordination, first we have developed “Web Coordination Bonds” as a capable set of dependency modeling primitives that enable each web service to manage its own dependencies. Web bond primitives are as powerful as extended Petri nets and have sufficient modeling and expressive capabilities to model workflow dependencies. We have designed and prototyped our “Web Service Coordination Management Middleware” (WSCMM) system that enhances current web services infrastructure to accommodate web bond enabled web services. Finally, based on core concepts of web coordination bonds and WSCMM, we have developed the “BondFlow” system that allows easy configuration distributed coordination of workflows. The footprint of the BonFlow runtime is 24KB and the additional third party software packages, SOAP client and XML parser, account for 115KB

    Knowledge Components and Methods for Policy Propagation in Data Flows

    Get PDF
    Data-oriented systems and applications are at the centre of current developments of the World Wide Web (WWW). On the Web of Data (WoD), information sources can be accessed and processed for many purposes. Users need to be aware of any licences or terms of use, which are associated with the data sources they want to use. Conversely, publishers need support in assigning the appropriate policies alongside the data they distribute. In this work, we tackle the problem of policy propagation in data flows - an expression that refers to the way data is consumed, manipulated and produced within processes. We pose the question of what kind of components are required, and how they can be acquired, managed, and deployed, to support users on deciding what policies propagate to the output of a data-intensive system from the ones associated with its input. We observe three scenarios: applications of the Semantic Web, workflow reuse in Open Science, and the exploitation of urban data in City Data Hubs. Starting from the analysis of Semantic Web applications, we propose a data-centric approach to semantically describe processes as data flows: the Datanode ontology, which comprises a hierarchy of the possible relations between data objects. By means of Policy Propagation Rules, it is possible to link data flow steps and policies derivable from semantic descriptions of data licences. We show how these components can be designed, how they can be effectively managed, and how to reason efficiently with them. In a second phase, the developed components are verified using a Smart City Data Hub as a case study, where we developed an end-to-end solution for policy propagation. Finally, we evaluate our approach and report on a user study aimed at assessing both the quality and the value of the proposed solution

    Trusted resource allocation in volunteer edge-cloud computing for scientific applications

    Get PDF
    Data-intensive science applications in fields such as e.g., bioinformatics, health sciences, and material discovery are becoming increasingly dynamic and demanding with resource requirements. Researchers using these applications which are based on advanced scientific workflows frequently require a diverse set of resources that are often not available within private servers or a single Cloud Service Provider (CSP). For example, a user working with Precision Medicine applications would prefer only those CSPs who follow guidelines from HIPAA (Health Insurance Portability and Accountability Act) for implementing their data services and might want services from other CSPs for economic viability. With the generation of more and more data these workflows often require deployment and dynamic scaling of multi-cloud resources in an efficient and high-performance manner (e.g., quick setup, reduced computation time, and increased application throughput). At the same time, users seek to minimize the costs of configuring the related multi-cloud resources. While performance and cost are among the key factors to decide upon CSP resource selection, the scientific workflows often process proprietary/confidential data that introduces additional constraints of security postures. Thus, users have to make an informed decision on the selection of resources that are most suited for their applications while trading off between the key factors of resource selection which are performance, agility, cost, and security (PACS). Furthermore, even with the most efficient resource allocation across multi-cloud, the cost to solution might not be economical for all users which have led to the development of new paradigms of computing such as volunteer computing where users utilize volunteered cyber resources to meet their computing requirements. For economical and readily available resources, it is essential that such volunteered resources can integrate well with cloud resources for providing the most efficient computing infrastructure for users. In this dissertation, individual stages such as user requirement collection, user's resource preferences, resource brokering and task scheduling, in lifecycle of resource brokering for users are tackled. For collection of user requirements, a novel approach through an iterative design interface is proposed. In addition, fuzzy interference-based approach is proposed to capture users' biases and expertise for guiding their resource selection for their applications. The results showed improvement in performance i.e. time to execute in 98 percent of the studied applications. The data collected on user's requirements and preferences is later used by optimizer engine and machine learning algorithms for resource brokering. For resource brokering, a new integer linear programming based solution (OnTimeURB) is proposed which creates multi-cloud template solutions for resource allocation while also optimizing performance, agility, cost, and security. The solution was further improved by the addition of a machine learning model based on naive bayes classifier which captures the true QoS of cloud resources for guiding template solution creation. The proposed solution was able to improve the time to execute for as much as 96 percent of the largest applications. As discussed above, to fulfill necessity of economical computing resources, a new paradigm of computing viz-a-viz Volunteer Edge Computing (VEC) is proposed which reduces cost and improves performance and security by creating edge clusters comprising of volunteered computing resources close to users. The initial results have shown improved time of execution for application workflows against state-of-the-art solutions while utilizing only the most secure VEC resources. Consequently, we have utilized reinforcement learning based solutions to characterize volunteered resources for their availability and flexibility towards implementation of security policies. The characterization of volunteered resources facilitates efficient allocation of resources and scheduling of workflows tasks which improves performance and throughput of workflow executions. VEC architecture is further validated with state-of-the-art bioinformatics workflows and manufacturing workflows.Includes bibliographical references

    Composing Multiple Variability Artifacts to Assemble Coherent Workflows

    Get PDF
    International audienceThe development of scientific workflows is evolving towards the system- atic use of service oriented architectures, enabling the composition of dedicated and highly parameterized software services into processing pipelines. Building consistent workflows then becomes a cumbersome and error-prone activity as users cannot man- age such large scale variability. This paper presents a rigorous and tooled approach in which techniques from Software Product Line (SPL) engineering are reused and ex- tended to manage variability in service and workflow descriptions. Composition can be facilitated while ensuring consistency. Services are organized in a rich catalog which is organized as a SPL and structured according to the common and variable concerns captured for all services. By relying on sound merging techniques on the feature mod- els that make up the catalog, reasoning about the compatibility between connected services is made possible. Moreover, an entire workflow is then seen as a multiple SPL (i.e., a composition of several SPLs). When services are configured within, the prop- agation of variability choices is then automated with appropriate techniques and the user is assisted in obtaining a consistent workflow. The approach proposed is com- pletely supported by a combination of dedicated tools and languages. Illustrations and experimental validations are provided using medical imaging pipelines, which are rep- resentative of current scientific workflows in many domains

    Model-driven development of data intensive applications over cloud resources

    Get PDF
    The proliferation of sensors over the last years has generated large amounts of raw data, forming data streams that need to be processed. In many cases, cloud resources are used for such processing, exploiting their flexibility, but these sensor streaming applications often need to support operational and control actions that have real-time and low-latency requirements that go beyond the cost effective and flexible solutions supported by existing cloud frameworks, such as Apache Kafka, Apache Spark Streaming, or Map-Reduce Streams. In this paper, we describe a model-driven and stepwise refinement methodological approach for streaming applications executed over clouds. The central role is assigned to a set of Petri Net models for specifying functional and non-functional requirements. They support model reuse, and a way to combine formal analysis, simulation, and approximate computation of minimal and maximal boundaries of non-functional requirements when the problem is either mathematically or computationally intractable. We show how our proposal can assist developers in their design and implementation decisions from a performance perspective. Our methodology allows to conduct performance analysis: The methodology is intended for all the engineering process stages, and we can (i) analyse how it can be mapped onto cloud resources, and (ii) obtain key performance indicators, including throughput or economic cost, so that developers are assisted in their development tasks and in their decision taking. In order to illustrate our approach, we make use of the pipelined wavefront array
    corecore