2,719 research outputs found
Elastic Business Process Management: State of the Art and Open Challenges for BPM in the Cloud
With the advent of cloud computing, organizations are nowadays able to react
rapidly to changing demands for computational resources. Not only individual
applications can be hosted on virtual cloud infrastructures, but also complete
business processes. This allows the realization of so-called elastic processes,
i.e., processes which are carried out using elastic cloud resources. Despite
the manifold benefits of elastic processes, there is still a lack of solutions
supporting them.
In this paper, we identify the state of the art of elastic Business Process
Management with a focus on infrastructural challenges. We conceptualize an
architecture for an elastic Business Process Management System and discuss
existing work on scheduling, resource allocation, monitoring, decentralized
coordination, and state management for elastic processes. Furthermore, we
present two representative elastic Business Process Management Systems which
are intended to counter these challenges. Based on our findings, we identify
open issues and outline possible research directions for the realization of
elastic processes and elastic Business Process Management.Comment: Please cite as: S. Schulte, C. Janiesch, S. Venugopal, I. Weber, and
P. Hoenisch (2015). Elastic Business Process Management: State of the Art and
Open Challenges for BPM in the Cloud. Future Generation Computer Systems,
Volume NN, Number N, NN-NN., http://dx.doi.org/10.1016/j.future.2014.09.00
A Taxonomy of Workflow Management Systems for Grid Computing
With the advent of Grid and application technologies, scientists and
engineers are building more and more complex applications to manage and process
large data sets, and execute scientific experiments on distributed resources.
Such application scenarios require means for composing and executing complex
workflows. Therefore, many efforts have been made towards the development of
workflow management systems for Grid computing. In this paper, we propose a
taxonomy that characterizes and classifies various approaches for building and
executing workflows on Grids. We also survey several representative Grid
workflow systems developed by various projects world-wide to demonstrate the
comprehensiveness of the taxonomy. The taxonomy not only highlights the design
and engineering similarities and differences of state-of-the-art in Grid
workflow systems, but also identifies the areas that need further research.Comment: 29 pages, 15 figure
DFlow: Efficient Dataflow-based Invocation Workflow Execution for Function-as-a-Service
The Serverless Computing is becoming increasingly popular due to its ease of
use and fine-grained billing. These features make it appealing for stateful
application or serverless workflow. However, current serverless workflow
systems utilize a controlflow-based invocation pattern to invoke functions. In
this execution pattern, the function invocation depends on the state of the
function. A function can only begin executing once all its precursor functions
have completed. As a result, this pattern may potentially lead to longer
end-to-end execution time. We design and implement the DFlow, a novel
dataflow-based serverless workflow system that achieves high performance for
serverless workflow. DFlow introduces a distributed scheduler (DScheduler) by
using the dataflow-based invocation pattern to invoke functions. In this
pattern, the function invocation depends on the data dependency between
functions. The function can start to execute even its precursor functions are
still running. DFlow further features a distributed store (DStore) that
utilizes effective fine-grained optimization techniques to eliminate function
interaction, thereby enabling efficient data exchange. With the support of
DScheduler and DStore, DFlow can achieving an average improvement of 60% over
CFlow, 40% over FaaSFlow, 25% over FaasFlowRedis, and 40% over KNIX on 99%-ile
latency respectively. Further, it can improve network bandwidth utilization by
2x-4x over CFlow and 1.5x-3x over FaaSFlow, FaaSFlowRedis and KNIX,
respectively. DFlow effectively reduces the cold startup latency, achieving an
average improvement of 5.6x over CFlow and 1.1x over FaaSFlowComment: 22 pages, 13 figure
A Model for Scientific Workflows with Parallel and Distributed Computing
In the last decade we witnessed an immense evolution of the computing infrastructures
in terms of processing, storage and communication. On one hand, developments in hardware architectures have made it possible to run multiple virtual machines on a single physical machine. On the other hand, the increase of the available network communication bandwidth has enabled the widespread use of distributed computing infrastructures, for example based on clusters, grids and clouds. The above factors enabled different scientific communities to aim for the development and implementation of complex scientific applications possibly involving large amounts of data. However, due to their structural complexity, these applications require decomposition models to allow multiple tasks running in parallel and distributed environments.
The scientific workflow concept arises naturally as a way to model applications composed of multiple activities. In fact, in the past decades many initiatives have been
undertaken to model application development using the workflow paradigm, both in
the business and in scientific domains. However, despite such intensive efforts, current
scientific workflow systems and tools still have limitations, which pose difficulties to the
development of emerging large-scale, distributed and dynamic applications.
This dissertation proposes the AWARD model for scientific workflows with parallel
and distributed computing. AWARD is an acronym for Autonomic Workflow Activities
Reconfigurable and Dynamic.
The AWARD model has the following main characteristics.
It is based on a decentralized execution control model where multiple autonomic
workflow activities interact by exchanging tokens through input and output ports. The
activities can be executed separately in diverse computing environments, such as in a
single computer or on multiple virtual machines running on distributed infrastructures,
such as clusters and clouds.
It provides basic workflow patterns for parallel and distributed application decomposition and other useful patterns supporting feedback loops and load balancing. The model is suitable to express applications based on a finite or infinite number of iterations, thus allowing to model long-running workflows, which are typical in scientific experimention. A distintive contribution of the AWARD model is the support for dynamic reconfiguration
of long-running workflows. A dynamic reconfiguration allows to modify the
structure of the workflow, for example, to introduce new activities, modify the connections
between activity input and output ports. The activity behavior can also be modified,
for example, by dynamically replacing the activity algorithm.
In addition to the proposal of a new workflow model, this dissertation presents the
implementation of a fully functional software architecture that supports the AWARD
model. The implemented prototype was used to validate and refine the model across
multiple workflow scenarios whose usefulness has been demonstrated in practice clearly, through experimental results, demonstrating the advantages of the major characteristics and contributions of the AWARD model. The implemented prototype was also used to develop application cases, such as a workflow to support the implementation of the MapReduce model and a workflow to support a text mining application developed by an external user.
The extensive experimental work confirmed the adequacy of the AWARD model and
its implementation for developing applications that exploit parallelism and distribution
using the scientific workflows paradigm
Evaluating the benefits of key-value databases for scientific applications
The convergence of Big Data applications with High-Performance Computing requires new methodologies to store, manage and process large amounts of information. Traditional storage solutions are unable to scale and that results in complex coding strategies. For example, the brain atlas of the Human Brain Project has the challenge to process large amounts of high-resolution brain images. Given the computing needs, we study the effects of replacing a traditional storage system with a distributed Key-Value database on a cell segmentation application. The original code uses HDF5 files on GPFS through an intricate interface, imposing synchronizations. On the other hand, by using Apache Cassandra or ScyllaDB through Hecuba, the application code is greatly simplified. Thanks to the Key-Value data model, the number of synchronizations is reduced and the time dedicated to I/O scales when increasing the number of nodes.This project/research has received funding from the European Unions Horizon
2020 Framework Programme for Research and Innovation under the Speci c
Grant Agreement No. 720270 (Human Brain Project SGA1) and the Speci c
Grant Agreement No. 785907 (Human Brain Project SGA2). This work has also
been supported by the Spanish Government (SEV2015-0493), by the Spanish
Ministry of Science and Innovation (contract TIN2015-65316-P), and by Generalitat
de Catalunya (contract 2017-SGR-1414).Postprint (author's final draft
- …