Search CORE

84 research outputs found

A framework for adaptive collective communications for heterogeneous hierarchical computing systems

Author: Mounié Grégory
Steffenel Luiz Angelo
Publication venue: 'Elsevier BV'
Publication date: 01/01/2008
Field of study

Collective communication operations are widely used in MPI applications and play an important role in their performance. However, the network heterogeneity inherent to grid environments represent a great challenge to develop efﬁcient high performance computing applications. In this work we propose a generic framework based on communication models and adaptive techniques for dealing with collective communication patterns on grid platforms. Toward this goal, we address the hierarchical organization of the grid, selecting the most efﬁcient communication algorithms at each network level. Our framework is also adaptive to grid load dynamics since it considers transient network characteristics for dividing the nodes into clusters. Our experiments with the broadcast operation on a real-grid setup indicate that an adaptive framework allows signiﬁcant performance improvements on MPI collective communications

INRIA a CCSD electronic archive server

Scheduling Independent Tasks on Multi-cores with GPU Accelerators

Author: Kedad-Sidhoum Safia
Monna Florence
Mounié Grégory
Trystram Denis
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 26/08/2013
Field of study

Best PaperInternational audienceMore and more computers use hybrid architectures combin-ing multi-core processors and hardware accelerators like GPUs (Graphics Processing Units). We present in this paper a new method for scheduling efficiently parallel applications with

m

CPUs and

k

GPUs, where each task of the application can be processed either on a core (CPU) or on a GPU. The objective is to minimize the makespan. The corresponding scheduling problem is NP-hard, we propose an efficient approximation algorithm which achieves an approximation ratio of

\frac{4}{3} + \frac{1}{3k}

. We first detail and analyze the method, based on a dual approximation scheme, that uses a dynamic programming scheme to balance evenly the load between the heterogeneous resources. Finally, we run some simulations based on realistic benchmarks and compare the solution obtained by a relaxed version of this method to the one provided by a classical greedy algorithm and to lower bounds on the value of the optimal makespan

Crossref

Hal - Université Grenoble Alpes

INRIA a CCSD electronic archive server

Performance Characterisation of Intra-Cluster Collective Communications

Author: Barchet-Estefanel Luiz Angelo
Mounié Grégory
Publication venue: IEEE Computer Society
Publication date: 01/01/2004
Field of study

International audienceAlthough recent works try to improve collective communication in grid systems by separating intra and inter-cluster communication, the optimisation of communications focus only on inter-cluster communications. We believe, instead, that the overall performance of the application may be improved if intra-cluster collective communications performance is known in advance. Hence, it is important to have an accurate model of the intra-cluster collective communications, which provides the necessary evidences to tune and to predict their performance correctly. In this paper we present our experience on modelling such communication strategies. We describe and compare different implementation strategies with their communication models, evaluating the models' accuracy and describing the practical challenges that can be found when modelling collective communications

arXiv.org e-Print Archive

Hal - Université Grenoble Alpes

INRIA a CCSD electronic archive server

Scheduling with Storage Constraints

Author: Dutot Pierre-Francois
Mounié Grégory
Saule Erik
Publication venue: HAL CCSD
Publication date: 01/04/2008
Field of study

International audienceWe are interested in this paper to study scheduling problems in systems where many users compete to perform their respective jobs on shared parallel resources. Each user has specific needs or wishes for computing his/her jobs expressed as a function to optimize (among maximum completion time, sum of completion times and sum of weighted completion times). Such problems have been mainly studied through Game Theory. In this work, we focus on solving the problem by optimizing simultaneously each user's objective function independently using classical combinatorial optimization techniques. Some results have already been proposed for two users on a single computing resource. However, no generic combinatorial method is known for many objectives. The analysis proposed in this paper concerns an arbitrarily fixed number of users and is not restricted to a single resource. We first derive inapproximability bounds; then we analyze several greedy heuristics whose approximation ratios are close to these bounds. However, they remain high since they are linear in the number of users. We provide a deeper analysis which shows that a slightly modified version of the algorithm is a constant approximation of a Pareto-optimal solution

Crossref

Hal - Université Grenoble Alpes

INRIA a CCSD electronic archive server

Scheduling independent tasks on multi-cores with GPU accelerators

Author: Bleuse Raphaël
Kedad-Sidhoum Safia
Monna Florence
Mounié Grégory
Trystram Denis
Publication venue: 'Wiley'
Publication date: 07/09/2014
Field of study

International audienceMore and more computers use hybrid architectures combining multi-core processors and hardware accelerators like GPUs (Graphics Process-ing Units). We present in this paper a new method for scheduling efficiently parallel applications with m CPUs and k GPUs, where each task of the appli-cation can be processed either on a core (CPU) or on a GPU. The objective is to minimize the maximum completion time (makespan). The corresponding scheduling problem is NP-hard, we propose an efficient approximation algo-rithm which achieves an approximation ratio of 4 3 + 1 3k . We first detail and analyze the method, based on a dual approximation scheme, that uses dynamic programming to balance evenly the load between the heterogeneous resources. Then, we present a faster approximation algorithm for a special case of the previous problem, where all the tasks are accelerated when affected to GPU, with a performance guarantee of 3 2 for any number of GPUs. We run some simulations based on realistic benchmarks and compare the solutions obtained by a relaxed version of the generic method to the one provided by a classical scheduling algorithm (HEFT). Finally, we present an implementation of the 4/3-approximation and its relaxed version on a classical linear algebra kernel into the scheduler of the xKaapi runtime system

Crossref

Hal - Université Grenoble Alpes

INRIA a CCSD electronic archive server

A batch scheduler with high level components

Author: Capit Nicolas
Da Costa Georges
Georgiou Yiannis
Huard Guillaume
Martin Cyrille
Mounié Grégory
Neyron Pierre
Richard Olivier
Publication venue
Publication date: 01/01/2005
Field of study

In this article we present the design choices and the evaluation of a batch scheduler for large clusters, named OAR. This batch scheduler is based upon an original design that emphasizes on low software complexity by using high level tools. The global architecture is built upon the scripting language Perl and the relational database engine Mysql. The goal of the project OAR is to prove that it is possible today to build a complex system for ressource management using such tools without sacrificing efficiency and scalability. Currently, our system offers most of the important features implemented by other batch schedulers such as priority scheduling (by queues), reservations, backfilling and some global computing support. Despite the use of high level tools, our experiments show that our system has performances close to other systems. Furthermore, OAR is currently exploited for the management of 700 nodes (a metropolitan GRID) and has shown good efficiency and robustness

arXiv.org e-Print Archive

CiteSeerX

Hal - Université Grenoble Alpes

INRIA a CCSD electronic archive server

A study of scheduling problems with preemptions on multi-core computers with GPU accelerators

Author: Błażewicz Jacek
Kedad-Sidhoum Safia
Monna Florence
Mounié Grégory
Trystram Denis
Publication venue: 'Elsevier BV'
Publication date: 01/12/2015
Field of study

International audienceFor many years, scheduling problems have been concerned either with parallel processor systems or with dedicated processors-job shop type systems. With a development of new computing architectures this partition is no longer so obvious. Multi-core (processor) computers equipped with GPU co-processors require new scheduling strategies. This paper is devoted to a characterization of this new type of scheduling problems. After a thorough introduction of the new model of a computing system, an extension of the classical notation of scheduling problems is proposed. A special attention is paid to preemptions, since this feature of the new architecture differs the most as compared with the classical model. In the paper, several scheduling algorithms, new ones and those refining classical approaches, are presented. Possible extensions of the model are also discussed

Crossref

Hal - Université Grenoble Alpes

INRIA a CCSD electronic archive server

An Approximation algorithm for scheduling Trees of Malleable Tasks

Author: Lepère Renaud
Mounié Grégory
Trystram Denis
Publication venue: 'Elsevier BV'
Publication date: 01/01/2002
Field of study

This work presents an approximation algorithm for scheduling the tasks of a parallel application. These tasks are considered as malleable tasks (MT in short), which means that they can be executed on several processors. This model receives recently a lot of attention, due mainly to their practical use for implementing actual parallel applications. Most of the works developed within this model deal with independent MT for which good approximation algorithms have been designed. This work is devoted to the case where MT are linked by precedence relations. We present a 1+epsilon approximation algorithm (for any fixed epsilon) for the specific structure of a tree. This preliminary result should open the way for further investigations concerning arbitrary precedence graphs of M

Hal - Université Grenoble Alpes

INRIA a CCSD electronic archive server

Hal-Diderot

Comment rater la validation de votre algorithme d'ordonnancement

Author: Cordeiro Daniel
Mounié Grégory
Perarnau Swann
Trystram Denis
Vincent Jean-Marc
Wagner Frédéric
Publication venue: HAL CCSD
Publication date: 01/01/2009
Field of study

National audienceImaginons que vous veniez de développer un nouvel algorithme d’ordonnancement : félicitations ! Pourdisposer d’informations qualitatives sur votre algorithme et le comparer à d’autres vous avez décidécomme beaucoup avant vous de réaliser des simulations. Très classiquement vos simulations portentsur des jeux de données aléatoires (ici, des graphes orientés acycliques)

Hal - Université Grenoble Alpes

INRIA a CCSD electronic archive server