Search CORE

131 research outputs found

A study of various load information exchange mechanisms for a distributed application using dynamic scheduling

Author: Guermouche Abdou
L'Excellent Jean-Yves
Publication venue: HAL CCSD
Publication date: 01/01/2005
Field of study

We consider a distributed asynchronous system where processes can only communicate by message passing and need a coherent view of the load(e.g.,workload,memory) of others to take dynamic decisions (scheduling).We present several mechanisms to obtain a distributed view of such information,based eithe ron maintaining that view or demand-driven witha snapshot algorithm.We perform an experimental study in the context of a real application,an asynchronous parallel solver for large sparse systems of linear equationsNous considérons un système distribué et asynchrone où les processus peuvent seulement communiquer par passage de messages, et requièrent une estimation correcte de la charge (travail en attente, mémoire utilisée) des autres processus pour procéder à des décisions dynamiques liées à l'ordonnancement des tâches de calcul. Nous présentons plusieurs types de mécanismes pour obtenir une vision distribuée de telles informations. Dans un premier type d'approches, la vision est maintenue grâce à des échanges de messages réguliers; dans le deuxième type d'approches (mécanismes à la demande ou de type snapshot), le processus demandeur des informations émet une requête, et reçoit ensuite les informations de charge correspondant à sa demande. Nous expérimentons ces approches dans le cadre d'une application réelle utilisant des ordonnanceurs dynamiques distribués

HAL-ENS-LYON

INRIA a CCSD electronic archive server

Hal-Diderot

Some Experiments and Issues to Exploit Multicore Parallelism in a Distributed-Memory Parallel Sparse Direct Solver

Author: Chowdhury Indranil
L'Excellent Jean-Yves
Publication venue: HAL CCSD
Publication date: 01/10/2010
Field of study

MUMPS is a parallel sparse direct solver, using message passing (MPI) for parallelism. In this report we experiment how thread parallelism can help taking advantage of recent multicore architectures. The work done consists in testing multithreaded BLAS libraries and inserting OpenMP directives in the routines revealed to be costly by profiling, with the objective to avoid any deep restructuring or rewriting of the code. We report on various aspects of this work, present some of the benefits and difficulties, and show that 4 threads per MPI process is generally a good compromise. We then discuss various issues that appear to be critical in a mixed MPI-OpenMP environment

HAL-ENS-LYON

INRIA a CCSD electronic archive server

Hal-Diderot

HAL-Rennes 1

Introduction of shared-memory parallelism in a distributed-memory multifrontal solver

Author: L'Excellent Jean-Yves
Sid-Lakhdar Mohamed W.
Publication venue: HAL CCSD
Publication date: 01/02/2013
Field of study

We study the adaptation of a parallel distributed-memory solver towards a shared-memory code, targeting multi-core architectures. The advantage of adapting the code over a new design is to fully benefit from its numerical kernels, range of functionalities and internal features. Although the studied code is a direct solver for sparse systems of linear equations, the approaches described in this paper are general and could be useful to a wide range of applications. We show how existing parallel algorithms can be adapted to an OpenMP environment while, at the same time, also relying on third-party optimized multithreaded libraries. We propose simple approaches to take advantage of NUMA architectures, and original optimizations to limit thread synchronization costs. For each point, the performance gains are analyzed in detail on test problems from various application areas.Dans cet article, nous étudions l'adaptation d'un code parallèle à mémoire distribuée en un code visant les architectures à mémoire partagée de type multi-coeurs. L'intérêt d'adapter un code existant plutôt que d'en concevoir un nouveau est de pouvoir bénéficier directement de toute la richesse de ses fonctionnalités numériques ainsi que de ses caractéristiques internes. Même si le code sur lequel porte l'étude est un solveur direct multifrontale pour systèmes linéaires creux, les algorithmes et techniques discutés sont générales et peuvent s'appliquer à des domaines d'application plus généraux. Nous montrons comment des algorithmes parallèles existant peuvent être adaptés à un environnement OpenMP tout en exploitant au mieux des librairies existantes optimisées. Nous présentons des approches simples pour tirer parti des spécificités des architectures NUMA, ainsi que des optimisations originales permettant de limiter les coûts de synchronisation dans le modèle fork-join que l'on utilise. Pour chacun de ces points, les gains en performance sont analysés sur des cas tests provenant de domaines d'applications variés

HAL-ENS-LYON

INRIA a CCSD electronic archive server

Hal-Diderot

HAL-Rennes 1

Scilab and MATLAB Interfaces to MUMPS (version 4.6 or greater)

Author: Fèvre Aurélia
L'Excellent Jean-Yves
Pralet Stéphane
Publication venue: HAL CCSD
Publication date: 01/01/2006
Field of study

This document describes the Scilab and MATLAB interfaces to MUMPS version 4.6. We describe the differences and similarities between usual Fortran/C MUMPS interfaces and its Scilab/MATLAB interfaces, the calling sequences and functionalities. Examples of use and experimental results are also provided.Nous décrivons les séquences d’appel et les fonctionnalités de nos interfaces Scilab/MATLAB et nous évoquons ses différences et similarités avec les interfaces Fortran/C habituelles de MUMPS. Nous présentons aussi des exemples d’utilisation et quelques résultats expérimentau

HAL-ENS-LYON

Scientific Publications of the University of Toulouse II Le Mirail

INRIA a CCSD electronic archive server

HAL Descartes

Hal-Diderot

HAL-Rennes 1

Parallel computation of entries of A-1

Author: Amestoy Patrick
Duff Iain S.
L'Excellent Jean-Yves
Rouet François-Henry
Publication venue: HAL CCSD
Publication date: 01/12/2012
Field of study

In this paper, we are concerned about computing in parallel several entries of the inverse of a large sparse matrix. We assume that the matrix has already been factorized by a direct method and that the factors are distributed. Entries are efficiently computed by exploiting sparsity of the right-hand sides and the solution vectors in the triangular solution phase. We demonstrate that in this setting, parallelism and computational efficiency are two contrasting objectives. We develop an efficient approach and show its efficacy by runs using the MUMPS code that implements a parallel multifrontal method

HAL-ENS-LYON

Scientific Publications of the University of Toulouse II Le Mirail

INRIA a CCSD electronic archive server

Modeling 1D distributed-memory dense kernels for an asynchronous multifrontal sparse solver

Author: Amestoy Patrick
L'Excellent Jean-Yves
Rouet François-Henry
Sid-Lakhdar Mohamed
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2014
Field of study

To solve sparse systems of linear equations, multifrontal methods rely on dense partial LU decompositions of so-called frontal matrices; we consider a parallel asynchronous setting in which several frontal matrices can be factored simultaneously. In this context, to address performance and scalability issues of acyclic pipelined asynchronous factorization kernels, we study models to revisit properties of left and right-looking variants of partial

LU

decompositions, study the use of several levels of blocking, before focusing on communication issues. The general purpose sparse solver MUMPS has been modified to implement the proposed algorithms and confirm the properties demonstrated by the models

Open Archive Toulouse Archive Ouverte

Introduction of shared-memory parallelism in a distributed-memory multifrontal solver

Author: L'Excellent Jean-Yves
Sid-Lakhdar Mohamed W.
Publication venue: HAL CCSD
Publication date: 01/02/2013
Field of study

INRIA a CCSD electronic archive server

Grouping variables in Frontal Matrices to improve Low-Rank Approximations in a Multifrontal Solver

Author: Amestoy Patrick
Ashcraft Cleve
Boiteau Olivier
Buttari Alfredo
L'Excellent Jean-Yves
Weisbecker Clement
Publication venue: HAL CCSD
Publication date: 16/05/2011
Field of study

Session 3International audienc

HAL-ENS-LYON

Scientific Publications of the University of Toulouse II Le Mirail

INRIA a CCSD electronic archive server

Hal-Diderot

HAL-Rennes 1

Improving multifrontal methods by means of block low-rank representations

Author: Amestoy Patrick
Ashcraft Cleve
Boiteau Olivier
Buttari Alfredo
L'Excellent Jean-Yves
Weisbecker Clément
Publication venue: HAL CCSD
Publication date: 01/01/2013
Field of study

Submitted for publication to SIAMMatrices coming from elliptic Partial Differential Equations (PDEs) have been shown to have a low-rank property: well defined off-diagonal blocks of their Schur complements can be approximated by low-rank products. Given a suitable ordering of the matrix which gives to the blocks a geometrical meaning, such approximations can be computed using an SVD or a rank-revealing QR factorization. The resulting representation offers a substantial reduction of the memory requirement and gives efficient ways to perform many of the basic dense algebra operations. Several strategies have been proposed to exploit this property. We propose a low-rank format called Block Low-Rank (BLR), and explain how it can be used to reduce the memory footprint and the complexity of direct solvers for sparse matrices based on the multifrontal method. We present experimental results that show how the BLR format delivers gains that are comparable to those obtained with hierarchical formats such as Hierarchical matrices (H matrices) and Hierarchically Semi-Separable (HSS matrices) but provides much greater flexibility and ease of use which are essential in the context of a general purpose, algebraic solver

HAL-ENS-LYON

Scientific Publications of the University of Toulouse II Le Mirail

INRIA a CCSD electronic archive server

Hal-Diderot

HAL-Rennes 1

Robust memory-aware mappings for parallel multifrontal factorizations

Author: Agullo Emmanuel
Amestoy Patrick
Buttari Alfredo
Guermouche Abdou
L'Excellent Jean-Yves
Rouet François-Henry
Publication venue: 'Society for Industrial & Applied Mathematics (SIAM)'
Publication date: 01/01/2016
Field of study

International audienceWe study the memory scalability of the parallel multifrontal factorization of sparse matrices. In particular, we are interested in controlling the active memory specific to the multifrontal factorization. We illustrate why commonly used mapping strategies (e.g., the proportional mapping) cannot provide a high memory efficiency, which means that they tend to let the memory usage of the factorization grow when the number of processes increases. We propose “memory-aware” algorithms that aim at maximizing the granularity of parallelism while respecting memory constraints. These algorithms provide accurate memory estimates prior to the factorization and can significantly enhance the robustness of a multifrontal code. We illustrate our approach with experiments performed on large matrices

HAL-ENS-LYON

Crossref

Scientific Publications of the University of Toulouse II Le Mirail

INRIA a CCSD electronic archive server

Hal-Diderot

HAL-Rennes 1