Search CORE

6 research outputs found

GrADSolve—a grid-based RPC system for parallel computing with application-level scheduling

Author: Arbenz
Balay
Berman
Bershad
Birrell
Butler
Casanova
Chang
Denis
Denis
Denis
Foster
Geist
Jack J. Dongarra
Maassen
Petitet
René
Sathish S. Vadhiyar
Sato
Wolski
Publication venue: 'Elsevier BV'
Publication date
Field of study

A Preemption-Based Meta-Scheduling System for Distributed Computing

Author: Vadhiyar Sathish
Publication venue: TRACE: Tennessee Research and Creative Exchange
Publication date: 01/05/2003
Field of study

This research aims at designing and building a scheduling framework for distributed computing systems with the primary objectives of providing fast response times to the users, delivering high system throughput and accommodating maximum number of applications into the systems. The author claims that the above mentioned objectives are the most important objectives for scheduling in recent distributed computing systems, especially Grid computing environments. In order to achieve the objectives of the scheduling framework, the scheduler employs arbitration of application-level schedules and preemption of executing jobs under certain conditions. In application-level scheduling, the user develops a schedule for his application using an execution model that simulates the execution behavior of the application. Since application-level scheduling can seriously impede the performance of the system, the scheduling framework developed in this research arbitrates between different application-level schedules corresponding to different applications to provide fair system usage for all applications and balance the interests of different applications. In this sense, the scheduling framework is not a classical scheduling system, but a meta-scheduling system that interacts with the application-level schedulers. Due to the large system dynamics involved in Grid computing systems, the ability to preempt executing jobs becomes a necessity. The meta-scheduler described in this dissertation employs well defined scheduling policies to preempt and migrate executing applications. In order to provide the users with the capability to make their applications preemptible, a user-level check-pointing library called SRS (Stop-Restart Software) was also developed by this research. The SRS library is different from many user-level check-pointing libraries since it allows reconfiguration of applications between migrations. This reconfiguration can be achieved by changing the processor configuration and/or data distribution. The experimental results provided in this dissertation demonstrates the utility of the metascheduling framework for distributed computing systems. And lastly, the metascheduling framework was put to practical use by building a Grid computing system called GradSolve. GradSolve is a flexible system and it allows the application library writers to upload applications with different capabilities into the system. GradSolve is also unique with respect to maintaining traces of the execution of the applications and using the traces for subsequent executions of the application

University of Tennessee, Knoxville: Trace

Enforcing consistency during the adaptation of a parallel component

Author: André Françoise
Buisson Jérémy
Pazat Jean-Louis
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 04/07/2005
Field of study

International audienceAs Grid architectures provide execution environments that are distributed, parallel and dynamic, applications require to be not only parallel and distributed, but also able to adapt themselves to their execution environment. This article presents a model for designing self-adaptable parallel components that can be assembled to build applications for Grid. This model includes the definition of a consistency criterion for the dynamic adaptation of SPMD components. We propose a solution to implement this criterion. It has been evalued on both synthetic and real codes to exhibit the behavior of the several proposed strategies

HAL-CentraleSupelec

INRIA a CCSD electronic archive server

HAL-Rennes 1

Desarrollo de una extensión de MPI para C++

Author: Rodríguez Melgar Francisco
Publication venue
Publication date: 13/07/2017
Field of study

The main objective of the software described here is allow a programmer to use a code that looks like a sequential one in a cluster of machines, and take advantage of distributed computing in an easy way. Using a library to do this, we want to offer a distributed version of the standard C++ vector container and a set of algorithms to work with it, taking advantage of distributed memory parallelism. This main objective has a set of key parts: Allowing the user to instantiate a vector that will be stored in the memory of more than one process or machine. o And make it able to choose how will be the data distributed to processes per concrete patterns. Providing a way to read binary data from a file and get it into the vector, distributed as expected. Providing a set of algorithms, with the same interface that their STL versions, that make possible to compute with the data in the vectors. Providing a way to access and operate with the vector like C++ way to access a vector. It includes iterators, operators and other C++ conventions, widely explained in section 2.1. This project has some secondary objectives too: Taking advantage of distributed computing to increase performance when using distributed algorithms. Implementing a load balance algorithm to make more efficient the execution of distributed algorithmsEl objetivo principal de este software es permitir a un programador utilizar un código que parezca el que se usaría en un programa secuencial para ejecutarlo en un clúster de máquinas. Por otro lado, utilizando esta librería, queremos ofrecer una manera al usuario de utilizar una versión distribuida del contenedor vector y un conjunto de algoritmos que trabajen con él, sacando provecho de las ventajas de la computación distribuida y del paralelismo. El objetivo principal tiene una serie de elementos clave: Permitir al usuario instanciar un vector que será almacenado en la memoria de más de un proceso u ordenador. o Hacerlo capaz de elegir como esos datos serán distribuidos a los procesos con patrones concretos. Ofrecer una manera de leer datos binarios desde un archivo y usarlos para llenar el vector, distribuidos como se especificó. Ofrecer una serie de algoritmos, con la misma interfaz que sus versiones de la STL que hagan posible hacer cálculos con los datos del vector. Ofrecer una manera de acceder y operar con el vector como se hace con un vector estándar de C++, lo cual incluye iteradores, operadores y otras convenciones de C++, ampliamente explicadas en la sección 2.1. Este proyecto también tiene estos objetivos secundarios: Sacar provecho de la computación distribuida para aumentar el rendimiento cuando se utilicen algoritmos distribuidos. Implementar un mecanismo de balanceo de carga que haga más eficiente la ejecución de los algoritmos en un conjunto de máquinas con diferentes características.Ingeniería Informátic

Universidad Carlos III de Madrid e-Archivo

GrADSolve—a grid-based RPC system for parallel computing with application-level scheduling $

Author: Jack J. Dongarra B
Sathish S. Vadhiyar A
Publication venue
Publication date: 01/01/2003
Field of study

Although some existing Remote Procedure Call (RPC) systems provide support for remote invocation of parallel applications, these RPC systems lack powerful scheduling methodologies for the dynamic selection of resources for the execution of parallel applications. Some RPC systems support parallel execution of software routines with simple modes of parallelism. Some RPC systems statically choose the configuration of resources for parallel execution even before the parallel routines are invoked remotely by the end user. These policies of the existing systems prevent them from being used for remotely solving computationally intensive parallel applications over dynamic computational Grid environments. In this paper, we discuss a RPC system called GrADSolve that supports execution of parallel applications over Grid resources. In GrADSolve, the resources used for the execution of parallel application are chosen dynamically based on the load characteristics of the resources and the characteristics of the application. Application-level scheduling is employed for taking into account both the application and resource properties. GrADSolve also stages the user’s data to the end resources based on the data distribution used by the end application. Finally, GrADSolve allows the users to store execution traces for problem solving and use the traces for subsequent solutions. Experiments are presented to prove that GrADSolve’s data staging mechanisms can significantly reduce the overhead associated with data movement in current RPC systems. Results are also presented to demonstrate the usefulness of utilizing the execution traces maintained by GrADSolve for problem solving

CiteSeerX

The University of Manchester - Institutional Repository