Search CORE

4,070 research outputs found

A Taxonomy of Workflow Management Systems for Grid Computing

Author: Buyya Rajkumar
Yu Jia
Publication venue
Publication date: 01/01/2005
Field of study

With the advent of Grid and application technologies, scientists and engineers are building more and more complex applications to manage and process large data sets, and execute scientific experiments on distributed resources. Such application scenarios require means for composing and executing complex workflows. Therefore, many efforts have been made towards the development of workflow management systems for Grid computing. In this paper, we propose a taxonomy that characterizes and classifies various approaches for building and executing workflows on Grids. We also survey several representative Grid workflow systems developed by various projects world-wide to demonstrate the comprehensiveness of the taxonomy. The taxonomy not only highlights the design and engineering similarities and differences of state-of-the-art in Grid workflow systems, but also identifies the areas that need further research.Comment: 29 pages, 15 figure

arXiv.org e-Print Archive

CiteSeerX

Recommended from our members

GRIDCC: Real-time workflow system

Author: Akram A
Colling D
Dickens L
Guo L
Kyberd P
Martyniak J
McGough A
Powell R S
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2007
Field of study

The Grid is a concept which allows the sharing of resources between distributed communities, allowing each to progress towards potentially different goals. As adoption of the Grid increases so are the activities that people wish to conduct through it. The GRIDCC project is a European Union funded project addressing the issues of integrating instruments into the Grid. This increases the requirement of workflows and Quality of Service upon these workflows as many of these instruments have real-time requirements. In this paper we present the workflow management service within the GRIDCC project which is tasked with optimising the workflows and ensuring that they meet the pre-defined QoS requirements specified upon them

Brunel University Research Archive

Workflow Partitioning and Deployment on the Cloud using Orchestra

Author: Barker Adam
Dearle Alan
Jaradat Ward
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 29/09/2014
Field of study

Orchestrating service-oriented workflows is typically based on a design model that routes both data and control through a single point - the centralised workflow engine. This causes scalability problems that include the unnecessary consumption of the network bandwidth, high latency in transmitting data between the services, and performance bottlenecks. These problems are highly prominent when orchestrating workflows that are composed from services dispersed across distant geographical locations. This paper presents a novel workflow partitioning approach, which attempts to improve the scalability of orchestrating large-scale workflows. It permits the workflow computation to be moved towards the services providing the data in order to garner optimal performance results. This is achieved by decomposing the workflow into smaller sub workflows for parallel execution, and determining the most appropriate network locations to which these sub workflows are transmitted and subsequently executed. This paper demonstrates the efficiency of our approach using a set of experimental workflows that are orchestrated over Amazon EC2 and across several geographic network regions.Comment: To appear in Proceedings of the IEEE/ACM 7th International Conference on Utility and Cloud Computing (UCC 2014

arXiv.org e-Print Archive

CiteSeerX

Crossref

University of St. Andrews - Pure

St Andrews Research Repository

Elastic Business Process Management: State of the Art and Open Challenges for BPM in the Cloud

Author: Hoenisch Philipp
Janiesch Christian
Schulte Stefan
Venugopal Srikumar
Weber Ingo
Publication venue: 'Elsevier BV'
Publication date: 22/09/2014
Field of study

With the advent of cloud computing, organizations are nowadays able to react rapidly to changing demands for computational resources. Not only individual applications can be hosted on virtual cloud infrastructures, but also complete business processes. This allows the realization of so-called elastic processes, i.e., processes which are carried out using elastic cloud resources. Despite the manifold benefits of elastic processes, there is still a lack of solutions supporting them. In this paper, we identify the state of the art of elastic Business Process Management with a focus on infrastructural challenges. We conceptualize an architecture for an elastic Business Process Management System and discuss existing work on scheduling, resource allocation, monitoring, decentralized coordination, and state management for elastic processes. Furthermore, we present two representative elastic Business Process Management Systems which are intended to counter these challenges. Based on our findings, we identify open issues and outline possible research directions for the realization of elastic processes and elastic Business Process Management.Comment: Please cite as: S. Schulte, C. Janiesch, S. Venugopal, I. Weber, and P. Hoenisch (2015). Elastic Business Process Management: State of the Art and Open Challenges for BPM in the Cloud. Future Generation Computer Systems, Volume NN, Number N, NN-NN., http://dx.doi.org/10.1016/j.future.2014.09.00

arXiv.org e-Print Archive

CiteSeerX

Multi-objective scheduling of Scientific Workflows in multisite clouds

Author: Liu Ji
Mattoso Marta
Oliveira Daniel de
Pacitti Esther
Valduriez Patrick
Publication venue: 'Elsevier BV'
Publication date: 01/01/2016
Field of study

Clouds appear as appropriate infrastructures for executing Scientific Workflows (SWfs). A cloud is typically made of several sites (or data centers), each with its own resources and data. Thus, it becomes important to be able to execute some SWfs at more than one cloud site because of the geographical distribution of data or available resources among different cloud sites. Therefore, a major problem is how to execute a SWf in a multisite cloud, while reducing execution time and monetary costs. In this paper, we propose a general solution based on multi-objective scheduling in order to execute SWfs in a multisite cloud. The solution consists of a multi-objective cost model including execution time and monetary costs, a Single Site Virtual Machine (VM) Provisioning approach (SSVP) and ActGreedy, a multisite scheduling approach. We present an experimental evaluation, based on the execution of the SciEvol SWf in Microsoft Azure cloud. The results reveal that our scheduling approach significantly outperforms two adapted baseline algorithms (which we propose by adapting two existing algorithms) and the scheduling time is reasonable compared with genetic and brute-force algorithms. The results also show that our cost model is accurate and that SSVP can generate better VM provisioning plans compared with an existing approach.Work partially funded by EU H2020 Programme and MCTI/RNP-Brazil (HPC4E grant agreement number 689772), CNPq, FAPERJ, and INRIA (MUSIC project), Microsoft (ZcloudFlow project) and performed in the context of the Computational Biology Institute (www.ibc-montpellier.fr). We would like to thank Kary Ocaña for her help in modeling and executing the SciEvol SWf.Peer ReviewedPostprint (author's final draft

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

UPCommons. Portal del coneixement obert de la UPC

INRIA a CCSD electronic archive server

HAL-Rennes 1

High-Performance Cloud Computing: A View of Scientific Applications

Author: Buyya Rajkumar
Pandey Suraj
Vecchiola Christian
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2009
Field of study

Scientific computing often requires the availability of a massive number of computers for performing large scale experiments. Traditionally, these needs have been addressed by using high-performance computing solutions and installed facilities such as clusters and super computers, which are difficult to setup, maintain, and operate. Cloud computing provides scientists with a completely new model of utilizing the computing infrastructure. Compute resources, storage resources, as well as applications, can be dynamically provisioned (and integrated within the existing infrastructure) on a pay per use basis. These resources can be released when they are no more needed. Such services are often offered within the context of a Service Level Agreement (SLA), which ensure the desired Quality of Service (QoS). Aneka, an enterprise Cloud computing solution, harnesses the power of compute resources by relying on private and public Clouds and delivers to users the desired QoS. Its flexible and service based infrastructure supports multiple programming paradigms that make Aneka address a variety of different scenarios: from finance applications to computational science. As examples of scientific computing in the Cloud, we present a preliminary case study on using Aneka for the classification of gene expression data and the execution of fMRI brain imaging workflow.Comment: 13 pages, 9 figures, conference pape

arXiv.org e-Print Archive

CiteSeerX

Crossref

High-Throughput Computing on High-Performance Platforms: A Case Study

Author: Angius Alessio
De Kaushik
Jha Shantenu
Klimentov Alexei
Oleynik Danila
Oral Sarp H.
Panitkin Sergey
Turilli Matteo
Wells Jack C.
Publication venue
Publication date: 27/10/2017
Field of study

The computing systems used by LHC experiments has historically consisted of the federation of hundreds to thousands of distributed resources, ranging from small to mid-size resource. In spite of the impressive scale of the existing distributed computing solutions, the federation of small to mid-size resources will be insufficient to meet projected future demands. This paper is a case study of how the ATLAS experiment has embraced Titan---a DOE leadership facility in conjunction with traditional distributed high- throughput computing to reach sustained production scales of approximately 52M core-hours a years. The three main contributions of this paper are: (i) a critical evaluation of design and operational considerations to support the sustained, scalable and production usage of Titan; (ii) a preliminary characterization of a next generation executor for PanDA to support new workloads and advanced execution modes; and (iii) early lessons for how current and future experimental and observational systems can be integrated with production supercomputers and other platforms in a general and extensible manner

arXiv.org e-Print Archive

Crossref