1,213 research outputs found
A Taxonomy of Workflow Management Systems for Grid Computing
With the advent of Grid and application technologies, scientists and
engineers are building more and more complex applications to manage and process
large data sets, and execute scientific experiments on distributed resources.
Such application scenarios require means for composing and executing complex
workflows. Therefore, many efforts have been made towards the development of
workflow management systems for Grid computing. In this paper, we propose a
taxonomy that characterizes and classifies various approaches for building and
executing workflows on Grids. We also survey several representative Grid
workflow systems developed by various projects world-wide to demonstrate the
comprehensiveness of the taxonomy. The taxonomy not only highlights the design
and engineering similarities and differences of state-of-the-art in Grid
workflow systems, but also identifies the areas that need further research.Comment: 29 pages, 15 figure
Simultaneous Scheduling of Replication and Computation for Data-Intensive Applications on the Grid
One of the first motivations of using grids comes from applications managing large data sets like for example in High Energy Physic or Life Sciences. To improve the global throughput of software environments, replicas are usually put at wisely selected sites. Moreover, computation requests have to be scheduled among the available resources. To get the best performance, scheduling and data replication have to be tightly coupled which is not always the case in existing approaches. This paper presents an algorithm that combines data management and scheduling at the same time using a steady-state approach. Our theoretical results are validated using simulation and logs from a large life science application (ACI GRID GriPPS).L'une des principales motivations pour utiliser les grilles de calcul vient des applications utilisant de larges ensembles de donnĂ©es comme, par exemple, en Physique des Hautes Energies ou en Science de la Vie. Pour amĂ©liorer le rendement global des environnements logiciels utilisĂ©es pour porter ces applications sur les grilles, des rĂ©plicats des donnĂ©es sont dĂ©posĂ©es sur diffĂ©rents sites sĂ©lectionnĂ©s. De plus es requĂȘtes de calcul doivent ĂȘtre ordonnancĂ©es en tenant compte des ressources disponibles. Pour obtenir de meilleures performances, l'ordonnancement des requĂȘtes et la rĂ©plication des donnĂ©es doivent ĂȘtre Ă©troitement couplĂ©s ce qui n'est gĂ©nĂ©ralement pas le cas dans les approches existantes. Cet article prĂ©sente un algorithme qui combine la gestion des donnĂ©es et l'ordonnancement en utilisant une approche en rĂ©gime permanent. Nos rĂ©sultats thĂ©oriques sont validĂ©s par simulations et par l'utilisation des traces d'un serveur de calcul d'application de Sciences de la Vie(ACIGRIDGRIPPS)
On Designing Multicore-aware Simulators for Biological Systems
The stochastic simulation of biological systems is an increasingly popular
technique in bioinformatics. It often is an enlightening technique, which may
however result in being computational expensive. We discuss the main
opportunities to speed it up on multi-core platforms, which pose new challenges
for parallelisation techniques. These opportunities are developed in two
general families of solutions involving both the single simulation and a bulk
of independent simulations (either replicas of derived from parameter sweep).
Proposed solutions are tested on the parallelisation of the CWC simulator
(Calculus of Wrapped Compartments) that is carried out according to proposed
solutions by way of the FastFlow programming framework making possible fast
development and efficient execution on multi-cores.Comment: 19 pages + cover pag
DIET : new developments and recent results
Among existing grid middleware approaches, one simple, powerful, and flexibleapproach consists of using servers available in different administrative domainsthrough the classic client-server or Remote Procedure Call (RPC) paradigm.Network Enabled Servers (NES) implement this model also called GridRPC.Clients submit computation requests to a scheduler whose goal is to find aserver available on the grid. The aim of this paper is to give an overview of anNES middleware developed in the GRAAL team called DIET and to describerecent developments. DIET (Distributed Interactive Engineering Toolbox) is ahierarchical set of components used for the development of applications basedon computational servers on the grid.Parmi les intergiciels de grilles existants, une approche simple, flexible et performante consiste a utiliser des serveurs disponibles dans des domaines administratifs diffĂ©rents Ă travers le paradigme classique de lâappel de procĂ©dure Ă distance (RPC). Les environnements de ce type, connus sous le terme de Network Enabled Servers, implĂ©mentent ce modĂšle appelĂ© GridRPC. Des clientssoumettent des requĂȘtes de calcul Ă un ordonnanceur dont le but consiste Ă trouver un serveur disponible sur la grille.Le but de cet article est de donner un tour dâhorizon dâun intergiciel dĂ©veloppĂ©dans le projet GRAAL appelĂ© DIET 1. DIET (Distributed Interactive Engineering Toolbox) est un ensemble hiĂ©rarchique de composants utilisĂ©s pour ledĂ©veloppement dâapplications basĂ©es sur des serveurs de calcul sur la grille
Scalable, Data- intensive Network Computation
To enable groups of collaborating researchers at different locations to effectively share large datasets and investigate their spontaneous hypotheses on the fly, we are interested in de- veloping a distributed system that can be easily leveraged by a variety of data intensive applications. The system is composed of (i) a number of best effort logistical depots to en- able large-scale data sharing and in-network data processing, (ii) a set of end-to-end tools to effectively aggregate, manage and schedule a large number of network computations with attendant data movements, and (iii) a Distributed Hash Table (DHT) on top of the generic depot services for scalable data management.
The logistical depot is extended by following the end-to-end principles and is modeled with a closed queuing network model. Its performance characteristics are studied by solving the steady state distributions of the model using local balance equations. The modeling results confirm that the wide area network is the performance bottleneck and running concurrent jobs can increase resource utilization and system throughput.
As a novel contribution, techniques to effectively support resource demanding data- intensive applications using the ÂŻne-grained depot services are developed. These techniques include instruction level scheduling of operations, dynamic co-scheduling of computation and replication, and adaptive workload control. Experiments in volume visualization have proved the effectiveness of these techniques. Due to the unique characteristic of data- intensive applications and our co-scheduling algorithm, a DHT is implemented on top of the basic storage and computation services. It demonstrates the potential of the Logistical Networking infrastructure to serve as a service creation platform
A Survey of Pipelined Workflow Scheduling: Models and Algorithms
International audienceA large class of applications need to execute the same workflow on different data sets of identical size. Efficient execution of such applications necessitates intelligent distribution of the application components and tasks on a parallel machine, and the execution can be orchestrated by utilizing task-, data-, pipelined-, and/or replicated-parallelism. The scheduling problem that encompasses all of these techniques is called pipelined workflow scheduling, and it has been widely studied in the last decade. Multiple models and algorithms have flourished to tackle various programming paradigms, constraints, machine behaviors or optimization goals. This paper surveys the field by summing up and structuring known results and approaches
3rd EGEE User Forum
We have organized this book in a sequence of chapters, each chapter associated with an application or technical theme introduced by an overview of the contents, and a summary of the main conclusions coming from the Forum for the chapter topic. The first chapter gathers all the plenary session keynote addresses, and following this there is a sequence of chapters covering the application flavoured sessions. These are followed by chapters with the flavour of Computer Science and Grid Technology. The final chapter covers the important number of practical demonstrations and posters exhibited at the Forum. Much of the work presented has a direct link to specific areas of Science, and so we have created a Science Index, presented below. In addition, at the end of this book, we provide a complete list of the institutes and countries involved in the User Forum
- âŠ