    A framework for adaptive collective communications for heterogeneous hierarchical computing systems

    Collective communication operations are widely used in MPI applications and play an important role in their performance. However, the network heterogeneity inherent to grid environments represent a great challenge to develop efficient high performance computing applications. In this work we propose a generic framework based on communication models and adaptive techniques for dealing with collective communication patterns on grid platforms. Toward this goal, we address the hierarchical organization of the grid, selecting the most efficient communication algorithms at each network level. Our framework is also adaptive to grid load dynamics since it considers transient network characteristics for dividing the nodes into clusters. Our experiments with the broadcast operation on a real-grid setup indicate that an adaptive framework allows significant performance improvements on MPI collective communications

    Scheduling Independent Tasks on Multi-cores with GPU Accelerators

    Best PaperInternational audienceMore and more computers use hybrid architectures combin-ing multi-core processors and hardware accelerators like GPUs (Graphics Processing Units). We present in this paper a new method for scheduling efficiently parallel applications with mm CPUs and kk GPUs, where each task of the application can be processed either on a core (CPU) or on a GPU. The objective is to minimize the makespan. The corresponding scheduling problem is NP-hard, we propose an efficient approximation algorithm which achieves an approximation ratio of 43+13k\frac{4}{3} + \frac{1}{3k} . We first detail and analyze the method, based on a dual approximation scheme, that uses a dynamic programming scheme to balance evenly the load between the heterogeneous resources. Finally, we run some simulations based on realistic benchmarks and compare the solution obtained by a relaxed version of this method to the one provided by a classical greedy algorithm and to lower bounds on the value of the optimal makespan

    Performance Characterisation of Intra-Cluster Collective Communications

    International audienceAlthough recent works try to improve collective communication in grid systems by separating intra and inter-cluster communication, the optimisation of communications focus only on inter-cluster communications. We believe, instead, that the overall performance of the application may be improved if intra-cluster collective communications performance is known in advance. Hence, it is important to have an accurate model of the intra-cluster collective communications, which provides the necessary evidences to tune and to predict their performance correctly. In this paper we present our experience on modelling such communication strategies. We describe and compare different implementation strategies with their communication models, evaluating the models' accuracy and describing the practical challenges that can be found when modelling collective communications

    Scheduling with Storage Constraints

    International audienceWe are interested in this paper to study scheduling problems in systems where many users compete to perform their respective jobs on shared parallel resources. Each user has specific needs or wishes for computing his/her jobs expressed as a function to optimize (among maximum completion time, sum of completion times and sum of weighted completion times). Such problems have been mainly studied through Game Theory. In this work, we focus on solving the problem by optimizing simultaneously each user's objective function independently using classical combinatorial optimization techniques. Some results have already been proposed for two users on a single computing resource. However, no generic combinatorial method is known for many objectives. The analysis proposed in this paper concerns an arbitrarily fixed number of users and is not restricted to a single resource. We first derive inapproximability bounds; then we analyze several greedy heuristics whose approximation ratios are close to these bounds. However, they remain high since they are linear in the number of users. We provide a deeper analysis which shows that a slightly modified version of the algorithm is a constant approximation of a Pareto-optimal solution

    International audienceMore and more computers use hybrid architectures combining multi-core processors and hardware accelerators like GPUs (Graphics Process-ing Units). We present in this paper a new method for scheduling efficiently parallel applications with m CPUs and k GPUs, where each task of the appli-cation can be processed either on a core (CPU) or on a GPU. The objective is to minimize the maximum completion time (makespan). The corresponding scheduling problem is NP-hard, we propose an efficient approximation algo-rithm which achieves an approximation ratio of 4 3 + 1 3k . We first detail and analyze the method, based on a dual approximation scheme, that uses dynamic programming to balance evenly the load between the heterogeneous resources. Then, we present a faster approximation algorithm for a special case of the previous problem, where all the tasks are accelerated when affected to GPU, with a performance guarantee of 3 2 for any number of GPUs. We run some simulations based on realistic benchmarks and compare the solutions obtained by a relaxed version of the generic method to the one provided by a classical scheduling algorithm (HEFT). Finally, we present an implementation of the 4/3-approximation and its relaxed version on a classical linear algebra kernel into the scheduler of the xKaapi runtime system

    A batch scheduler with high level components

    In this article we present the design choices and the evaluation of a batch scheduler for large clusters, named OAR. This batch scheduler is based upon an original design that emphasizes on low software complexity by using high level tools. The global architecture is built upon the scripting language Perl and the relational database engine Mysql. The goal of the project OAR is to prove that it is possible today to build a complex system for ressource management using such tools without sacrificing efficiency and scalability. Currently, our system offers most of the important features implemented by other batch schedulers such as priority scheduling (by queues), reservations, backfilling and some global computing support. Despite the use of high level tools, our experiments show that our system has performances close to other systems. Furthermore, OAR is currently exploited for the management of 700 nodes (a metropolitan GRID) and has shown good efficiency and robustness

    A study of scheduling problems with preemptions on multi-core computers with GPU accelerators

    International audienceFor many years, scheduling problems have been concerned either with parallel processor systems or with dedicated processors-job shop type systems. With a development of new computing architectures this partition is no longer so obvious. Multi-core (processor) computers equipped with GPU co-processors require new scheduling strategies. This paper is devoted to a characterization of this new type of scheduling problems. After a thorough introduction of the new model of a computing system, an extension of the classical notation of scheduling problems is proposed. A special attention is paid to preemptions, since this feature of the new architecture differs the most as compared with the classical model. In the paper, several scheduling algorithms, new ones and those refining classical approaches, are presented. Possible extensions of the model are also discussed

    An Approximation algorithm for scheduling Trees of Malleable Tasks

    This work presents an approximation algorithm for scheduling the tasks of a parallel application. These tasks are considered as malleable tasks (MT in short), which means that they can be executed on several processors. This model receives recently a lot of attention, due mainly to their practical use for implementing actual parallel applications. Most of the works developed within this model deal with independent MT for which good approximation algorithms have been designed. This work is devoted to the case where MT are linked by precedence relations. We present a 1+epsilon approximation algorithm (for any fixed epsilon) for the specific structure of a tree. This preliminary result should open the way for further investigations concerning arbitrary precedence graphs of M

    Comment rater la validation de votre algorithme d'ordonnancement

    National audienceImaginons que vous veniez de développer un nouvel algorithme d’ordonnancement : félicitations ! Pourdisposer d’informations qualitatives sur votre algorithme et le comparer à d’autres vous avez décidécomme beaucoup avant vous de réaliser des simulations. Très classiquement vos simulations portentsur des jeux de données aléatoires (ici, des graphes orientés acycliques)