Search CORE

641 research outputs found

Communications à hautes performances portables en environnements hiérarchiques, hétérogènes et dynamiques

Author: MERCIER Guillaume
Publication venue
Publication date: 13/01/2021
Field of study

Cette thèse a pour cadre les communications dans les machines para llèles dans une optique de calcul haute-performance. Les évolutions du matériel ont rendu nécessaire les adaptations des logiciels destinés à exploiter les machines parallèles. En effet, les architectures de type “grappes” sont maintenant très répandues et l'apparition des grilles de calcul complique encore plus la situation car l'obtention des hautes performances passe par une exploitation des différents réseaux rapides disponibles et une prise en compte de la hiérarchie intrinsèque des configurations considérées. Au niveau applicatif, de nouvelles exigences émergent comme la dynamicité. Or, ces aspects sont trop souvent partiellement traités, en particulier dans les implémentations du standard de programmation par passage de messages MPI. Les solutions existantes se concentrent sur la hiérarchie et l'hétérogénéité ou la dynamicité, exceptionnellement les deux. En ce qui concerne les premiers aspects, des simplifications conduisent à une exploitation suboptimale du matériel potentiellement disponible. Nous avons analysé des implémentations existantes de MPI et avons proposé une architecture répondant aux besoins formulés. Cette architecture repose sur une forte interaction entre communications et processus légers et son coeur est constitué par un moteur de progression des communications qui permet d'améliorer substantiellement les mécanismes existants. Les deux éléments logiciels fondamentaux sont une bibliothèque de processus légers (Marcel) ainsi qu'une couche générique de communication (Madeleine). L'implémentation de cette architecture a débouché sur le logiciel MPICH-Madeleine, utilisé ou évalué par plusieurs équipes et projets de recherche en France comme à l'étranger. L'évalution des performances (comparaisons avec Madeleine, mesures des opérations point-à-point, noyaux applicatifs) menée avec plusieurs réseaux haut-débit sur des grappes homogènes de machines multi-processeurs et les comparaisons avec MPICH-G2 ou PACX-MPI en environnement hétérogène démontrent que MPICH-Madeleine atteint des résultats de niveau similaire voire supérieur à ceux d'implémentations spécialisées de MPI.This thesis targets communication within parallel computers with an emphasis on highperformance computing. The software exploiting parallel computers had to adapt to their evolutions. Indeed, architectures such as PC clusters are now widespread and the emergence of grids tends to add new levels of complexity since high-performance can be obtained through exploitating the different high-speed networks available as well as taking into account the inherent hierarchy of the configurations. And as far as applications are concerned, new functionalities are also required, such as dynamicity. Those aspects are far too often neglected or partially tackled in existing implementations of the message passing standard, that is, MPI. Current solutions do focus on hierarchy and heterogeneity or on dynamicity, rarely both and regarding the first aspects, some simplifications do not lead to a full exploitation of the underlying hardware. We have analyzed existing MPI implementations and have proposed an architecture that answers the needs we pointed out. This architecture relies on a strong interaction between threads and communication and its core is build above a progression engine that improves existing mechanisms. The two key elements used are a user-level thread library (Marcel) and generic communication library (Madeleine). The implementation of this architecture, MPICH-MAdeleine, is used or evaluated by several research groups, both french and foreign. The performance assessment carried out with several high-speed networks in both homogenous and heterogenous environments shows that MPICHMadeleine's performance level is equal or superior to that of the software it challenges

Oskar Bordeaux

Optimisation Mechanisms for MPICH/Madeleine

Author: Furmento Nathalie
Mercier Guillaume
Publication venue: HAL CCSD
Publication date: 01/01/2005
Field of study

This report presents optimisations mechanisms within MPICH/Madeleine , the implementation of MPICH over Madeleine. These mechanisms aim to decrease the communication time of derived datatypes for which data is stored in noncontiguous memory areas. The report presents the mechanisms as well as some performance evaluation

INRIA a CCSD electronic archive server

MPICH/Madeleine Installer's, User's and Developer's Guide

Author: Furmento Nathalie
Mercier Guillaume
Publication venue: HAL CCSD
Publication date: 01/01/2005
Field of study

MPICH/Madeleine is a new free implementation of the MPI standard based on the MPICH implementation and the multi-protocol communication library called Madeleine. It aims to efficiently exploit clusters of clusters with heterogeneous networks. This manual presents an installer's, user's and developer's guide for MPICH/ Madeleine. The latest version of this document is available from the following URL: http://runtime.futurs.inria.fr/mpi/manual/

INRIA a CCSD electronic archive server

Improving MPI Applications Performance on Multicore Clusters with Rank Reordering

Author: Jeannot Emmanuel
Mercier Guillaume
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 18/09/2011
Field of study

International audienceModern hardware architectures featuring multicores and a complex memory hierarchy raise challenges that need to be addressed by parallel applications programmers. It is therefore tempting to adapt an application communication pattern to the characteristics of the underlying hardware. The MPI standard features several functions that allow the ranks of MPI processes to be reordered according to a graph attached to a newly created communicator. In this paper, we explain how the MPICH2 implementation of the MPI_Dist_graph_create function was modified to reorder the MPI process ranks to create a match between the application communication pattern and the hardware topology. The experimental results on a multicore cluster show that improvements can be achieved as long as the application communication pattern is expressed by a relevant metric

INRIA a CCSD electronic archive server

HAL Descartes

Hal-Diderot

Oskar Bordeaux

AIS-based Evaluation of Target Detectors and SAR Sensors Characteristics for Maritime Surveillance

Author: Garello René
Hajduch Guillaume
Longépé Nicolas
Mercier Grégoire
Pelich Ramona-Maria
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/08/2015
Field of study

International audienceThis paper studies the performances of different ship detectors based on adaptive threshold algorithms. The detec- tion algorithms are based on various clutter distributions and assessed automatically with a systematic methodology. Evaluation using large datasets of medium resolution SAR images and AIS (Automatic Identification System) data as ground truths allows to evaluate the efficiency of each detector. Depending on the datasets used for testing, the detection algorithms offer different advantages and disadvantages. The systematic method used in discriminating real detected targets and false alarms in order to determine the detection rate, allows us to perform an appropriate and consistent comparison of the detectors. The impact of SAR sensors characteristics (incidence angle, polarization, frequency and spatial resolution) is fully assessed, the vessels' length being also considered. Experiments are conducted on Radarsat-2 and CosmoSkymed ScanSAR datasets and AIS data acquired by coastal stations

Crossref

HAL-Université de Bretagne Occidentale

Large-scale experiment for topology-aware resource management

Author: Georgiou Yiannis
Mercier Guillaume
Villiermet Adèle
Publication venue: HAL CCSD
Publication date: 29/08/2017
Field of study

International audienceA Resource and Job Management System (RJMS) is a crucial system software part of the HPC stack. It is responsible for efficiently delivering computing power to applications in supercomputing environments and its main intelligence relies on resource selection techniques to find the most adapted resources to schedule the users' jobs. In [8], we introduced a new topology-aware resource selection algorithm to determine the best choice among the available nodes of the platform based on their position in the network and on application behaviour (expressed as a communication matrix). We did integrate this algorithm as a plugin in Slurm and validated it with several optimization schemes by making comparisons with the default Slurm algorithm. This paper presents further experiments with regard to this selection process

INRIA a CCSD electronic archive server

Topology and affinity aware hierarchical and distributed load-balancing in Charm++

Author: Jeannot Emmanuel
Mercier Guillaume
Tessier François
Publication venue: HAL CCSD
Publication date: 18/11/2016
Field of study

International audienceThe evolution of massively parallel supercomputers make palpable two issues in particular: the load imbalance and the poor management of data locality in applications. Thus, with the increase of the number of cores and the drastic decrease of amount of memory per core, the large performance needs imply to particularly take care of the load-balancing and as much as possible of the locality of data. One mean to take into account this locality issue relies on the placement of the processing entities and load balancing techniques are relevant in order to improve application performance. With large-scale platforms in mind, we developed a hierarchical and distributed algorithm which aim is to perform a topology-aware load balancing tailored for Charm++ applications. This algorithm is based on both LibTopoMap for the network awareness aspects and on TREEMATCH to determine a relevant placement of the processing entities. We show that the proposed algorithm improves the overall execution time in both the cases of real applications and a synthetic benchmark as well. For this last experiment, we show a scalability up to one millions processing entities

INRIA a CCSD electronic archive server

HAL Descartes

Hal-Diderot

TreeMatch : Un algorithme de placement de processus sur architectures multicœurs

Author: Jeannot Emmanuel
Mercier Guillaume
Tessier François
Publication venue: HAL CCSD
Publication date: 16/01/2013
Field of study

Conférence ComPAR/RenPAR 2013National audienceDepuis quelques années, les clusters de nœuds NUMA à processeurs multi-cœurs deviennent très répandus. Programmer efficacement ces architectures est un réel défi compte tenu de leur hiérarchie complexe. Afin d'en tirer pleinement profit, il est nécessaire de prendre en compte cette structure de façon précise et d'y faire correspondre le schéma de communication de l'application. Ce faisant, les coûts de communication sont réduits et l'on observe des gains sur le temps d'exécution total de l'application. Nous présentons ici comment nous utilisons d'un côté le schéma de communication et de l'autre une représentation fidèle de l'architecture pour produire une permutation des processus d'une application donnée, permettant ainsi une réduction des coûts de communication

INRIA a CCSD electronic archive server

HAL-Rennes 1

Matching communication pattern with underlying hardware architecture

Author: Jeannot Emmanuel
Mercier Guillaume
Tessier François
Publication venue: HAL CCSD
Publication date: 01/07/2014
Field of study

International audienceMATCHING COMMUNICATION PATTERN WITH UNDERLYING HARDWARE ARCHITECTUR

INRIA a CCSD electronic archive server

HAL Descartes

Hal-Diderot

An Overview of Process Mapping Techniques and Algorithms in High-Performance Computing

Author: Hoefler Torsten
Jeannot Emmanuel
Mercier Guillaume
Publication venue: 'Royal College of Obstetricians & Gynaecologists (RCOG)'
Publication date: 01/06/2014
Field of study

International audienceDue to the advent of modern hardware architectures of high-performance comput- ers, the way the parallel applications are laid out is of paramount importance for performance. This chapter surveys several techniques and algorithms that efficiently address this issue: the mapping of the application's virtual topology (for instance its communication pattern) onto the physical topology. Using such strategy enables to improve the application overall execution time significantly. The chapter concludes by listing a series of open issues and problems

INRIA a CCSD electronic archive server

HAL Descartes

Hal-Diderot