1,213 research outputs found

    A Taxonomy of Workflow Management Systems for Grid Computing

    Full text link
    With the advent of Grid and application technologies, scientists and engineers are building more and more complex applications to manage and process large data sets, and execute scientific experiments on distributed resources. Such application scenarios require means for composing and executing complex workflows. Therefore, many efforts have been made towards the development of workflow management systems for Grid computing. In this paper, we propose a taxonomy that characterizes and classifies various approaches for building and executing workflows on Grids. We also survey several representative Grid workflow systems developed by various projects world-wide to demonstrate the comprehensiveness of the taxonomy. The taxonomy not only highlights the design and engineering similarities and differences of state-of-the-art in Grid workflow systems, but also identifies the areas that need further research.Comment: 29 pages, 15 figure

    Simultaneous Scheduling of Replication and Computation for Data-Intensive Applications on the Grid

    Get PDF
    One of the first motivations of using grids comes from applications managing large data sets like for example in High Energy Physic or Life Sciences. To improve the global throughput of software environments, replicas are usually put at wisely selected sites. Moreover, computation requests have to be scheduled among the available resources. To get the best performance, scheduling and data replication have to be tightly coupled which is not always the case in existing approaches. This paper presents an algorithm that combines data management and scheduling at the same time using a steady-state approach. Our theoretical results are validated using simulation and logs from a large life science application (ACI GRID GriPPS).L'une des principales motivations pour utiliser les grilles de calcul vient des applications utilisant de larges ensembles de donnĂ©es comme, par exemple, en Physique des Hautes Energies ou en Science de la Vie. Pour amĂ©liorer le rendement global des environnements logiciels utilisĂ©es pour porter ces applications sur les grilles, des rĂ©plicats des donnĂ©es sont dĂ©posĂ©es sur diffĂ©rents sites sĂ©lectionnĂ©s. De plus es requĂȘtes de calcul doivent ĂȘtre ordonnancĂ©es en tenant compte des ressources disponibles. Pour obtenir de meilleures performances, l'ordonnancement des requĂȘtes et la rĂ©plication des donnĂ©es doivent ĂȘtre Ă©troitement couplĂ©s ce qui n'est gĂ©nĂ©ralement pas le cas dans les approches existantes. Cet article prĂ©sente un algorithme qui combine la gestion des donnĂ©es et l'ordonnancement en utilisant une approche en rĂ©gime permanent. Nos rĂ©sultats thĂ©oriques sont validĂ©s par simulations et par l'utilisation des traces d'un serveur de calcul d'application de Sciences de la Vie(ACIGRIDGRIPPS)

    On Designing Multicore-aware Simulators for Biological Systems

    Full text link
    The stochastic simulation of biological systems is an increasingly popular technique in bioinformatics. It often is an enlightening technique, which may however result in being computational expensive. We discuss the main opportunities to speed it up on multi-core platforms, which pose new challenges for parallelisation techniques. These opportunities are developed in two general families of solutions involving both the single simulation and a bulk of independent simulations (either replicas of derived from parameter sweep). Proposed solutions are tested on the parallelisation of the CWC simulator (Calculus of Wrapped Compartments) that is carried out according to proposed solutions by way of the FastFlow programming framework making possible fast development and efficient execution on multi-cores.Comment: 19 pages + cover pag

    DIET : new developments and recent results

    Get PDF
    Among existing grid middleware approaches, one simple, powerful, and flexibleapproach consists of using servers available in different administrative domainsthrough the classic client-server or Remote Procedure Call (RPC) paradigm.Network Enabled Servers (NES) implement this model also called GridRPC.Clients submit computation requests to a scheduler whose goal is to find aserver available on the grid. The aim of this paper is to give an overview of anNES middleware developed in the GRAAL team called DIET and to describerecent developments. DIET (Distributed Interactive Engineering Toolbox) is ahierarchical set of components used for the development of applications basedon computational servers on the grid.Parmi les intergiciels de grilles existants, une approche simple, flexible et performante consiste a utiliser des serveurs disponibles dans des domaines administratifs diffĂ©rents Ă  travers le paradigme classique de l’appel de procĂ©dure Ă distance (RPC). Les environnements de ce type, connus sous le terme de Network Enabled Servers, implĂ©mentent ce modĂšle appelĂ© GridRPC. Des clientssoumettent des requĂȘtes de calcul Ă  un ordonnanceur dont le but consiste Ă trouver un serveur disponible sur la grille.Le but de cet article est de donner un tour d’horizon d’un intergiciel dĂ©veloppĂ©dans le projet GRAAL appelĂ© DIET 1. DIET (Distributed Interactive Engineering Toolbox) est un ensemble hiĂ©rarchique de composants utilisĂ©s pour ledĂ©veloppement d’applications basĂ©es sur des serveurs de calcul sur la grille

    Scalable, Data- intensive Network Computation

    Get PDF
    To enable groups of collaborating researchers at different locations to effectively share large datasets and investigate their spontaneous hypotheses on the fly, we are interested in de- veloping a distributed system that can be easily leveraged by a variety of data intensive applications. The system is composed of (i) a number of best effort logistical depots to en- able large-scale data sharing and in-network data processing, (ii) a set of end-to-end tools to effectively aggregate, manage and schedule a large number of network computations with attendant data movements, and (iii) a Distributed Hash Table (DHT) on top of the generic depot services for scalable data management. The logistical depot is extended by following the end-to-end principles and is modeled with a closed queuing network model. Its performance characteristics are studied by solving the steady state distributions of the model using local balance equations. The modeling results confirm that the wide area network is the performance bottleneck and running concurrent jobs can increase resource utilization and system throughput. As a novel contribution, techniques to effectively support resource demanding data- intensive applications using the ÂŻne-grained depot services are developed. These techniques include instruction level scheduling of operations, dynamic co-scheduling of computation and replication, and adaptive workload control. Experiments in volume visualization have proved the effectiveness of these techniques. Due to the unique characteristic of data- intensive applications and our co-scheduling algorithm, a DHT is implemented on top of the basic storage and computation services. It demonstrates the potential of the Logistical Networking infrastructure to serve as a service creation platform

    A Survey of Pipelined Workflow Scheduling: Models and Algorithms

    Get PDF
    International audienceA large class of applications need to execute the same workflow on different data sets of identical size. Efficient execution of such applications necessitates intelligent distribution of the application components and tasks on a parallel machine, and the execution can be orchestrated by utilizing task-, data-, pipelined-, and/or replicated-parallelism. The scheduling problem that encompasses all of these techniques is called pipelined workflow scheduling, and it has been widely studied in the last decade. Multiple models and algorithms have flourished to tackle various programming paradigms, constraints, machine behaviors or optimization goals. This paper surveys the field by summing up and structuring known results and approaches

    3rd EGEE User Forum

    Get PDF
    We have organized this book in a sequence of chapters, each chapter associated with an application or technical theme introduced by an overview of the contents, and a summary of the main conclusions coming from the Forum for the chapter topic. The first chapter gathers all the plenary session keynote addresses, and following this there is a sequence of chapters covering the application flavoured sessions. These are followed by chapters with the flavour of Computer Science and Grid Technology. The final chapter covers the important number of practical demonstrations and posters exhibited at the Forum. Much of the work presented has a direct link to specific areas of Science, and so we have created a Science Index, presented below. In addition, at the end of this book, we provide a complete list of the institutes and countries involved in the User Forum
    • 

    corecore