Search CORE

8 research outputs found

Two parallel implementations of Ehrlich-Aberth algorithm for root-finding of polynomials on multiple GPUs with OpenMP and MPI

Author: Couturier Raphael
Ghidouche Kahina
Sider Abderrahmane
Ziane Khodja Lilia
Publication venue: HAL CCSD
Publication date: 24/08/2016
Field of study

International audienceFinding the roots of polynomials is a very important part of solving real-life problems but the higher the degree of the polynomials is, the less easy it becomes. In this paper, we present two different parallel algorithms of the Ehrlich-Aberth method to find roots of sparse and fully defined polynomials of high degrees. Both algorithms are based on CUDA technology to be implemented on multi-GPU computing platforms but each use different parallel paradigms: OpenMP or MPI. The experiments show a quasi-linear speedup by using up-to 4 GPU devices compared to 1 GPU to find the roots of polynomials of degree up-to 1.4 million. Moreover, other experiments show it is possible to find the roots of polynomials of degree up-to 5 million

HAL - Université de Franche-Comté

Crossref

Résolution de systèmes linéaires et non linéaires creux sur grappes de GPUs

Author: Ziane Khodja Lilia
Publication venue: HAL CCSD
Publication date: 07/06/2013
Field of study

Or the past few years, the clusters equipped with GPUs have become attractive tools for high performance computing. In this thesis, we have designed parallel iterative algorithms for solving large sparse linear and nonlinear systems on GPU clusters. First, we have focused on solving sparse linear systems using CG and GMRES iterative methods. The experiments have shown that a GPU cluster is more efficient that its pure CPU counterpart for solving large sparse systems of linear equations. Then, we have implemented the synchronous and asynchronous algorithms of the Richardson and the block relaxation iterative methods for solving sparse nonlinear systems. We have noticed that the best solutions developed for the CPUs are not necessarily well suited to GPUs. Indeed, the experiments performed on a GPU cluster have shown that the parallel algorithms of the Richardson method are far more efficient than those of the block relaxation method. In addition, they have shown that the computing power of GPUs allows to reduce the ratio between the time of the computation over that of the communication, which favors the use of the asynchronous iteration on GPU clusters. Finally, we are interested in geographically distant clusters for solving large sparse linear systems. In this context, we have used a multisplitting two-stage method using parallel GMRES method adapted to GPU clusters. It uses the synchronous iteration to solve locally the sub-linear systems and the asynchronous one to solve the global sparse linear system.Depuis quelques années, les grappes équipées de processeurs graphiques GPUs sont devenues des outils très attrayants pour le calcul parallèle haute performance. Dans cette thèse, nous avons conçu des algorithmes itératifs parallèles pour la résolution de systèmes linéaires et non linéaires creux de très grandes tailles sur grappes de GPUs. Dans un premier temps, nous nous sommes focalisés sur la résolution de systèmes linéaires creux à l'aide des méthodes itératives CG et GMRES. Les expérimentations ont montré qu'une grappe de GPUs est plus performante que son homologue grappe de CPUs pour la résolution de systèmes linéaires de très grandes tailles. Ensuite, nous avons mis en oeuvre des algorithmes parallèles synchrones et asynchrones des méthodes itératives Richardson et de relaxation par blocs pour la résolution de systèmes non linéaires creux. Nous avons constaté que les meilleurs solutions développées pour les CPUs ne sont pas nécessairement bien adaptées aux GPUs. En effet, les simulations effectuées sur une grappe de GPUs ont montré que les algorithmes Richardson sont largement plus efficaces que ceux de relaxation par blocs. De plus, elles ont aussi montré que la puissance de calcul des GPUs permet de réduire le rapport entre le temps d'exécution et celui de communication, ce qui favorise l'utilisation des algorithmes asynchrones sur des grappes de GPUs. Enfin, nous nous sommes intéressés aux grappes géographiquement distantes pour la résolution de systèmes linéaires creux. Dans ce contexte, nous avons utilisé la méthode de multi-décomposition à deux niveaux avec GMRES parallèle adaptée aux grappes de GPUs. Celle-ci utilise des itérations synchrones pour résoudre localement les sous-systèmes linéaires et des itérations asynchrones pour résoudre la globalité du système linéaire

Thèses en Ligne

HAL - Université de Franche-Comté

Solving sparse linear and nonlinear systems on GPU clusters

Author: Ziane Khodja Lilia
Publication venue
Publication date: 07/06/2013
Field of study

Depuis quelques années, les grappes équipées de processeurs graphiques GPUs sont devenues des outils très attrayants pour le calcul parallèle haute performance. Dans cette thèse, nous avons conçu des algorithmes itératifs parallèles pour la résolution de systèmes linéaires et non linéaires creux de très grandes tailles sur grappes de GPUs. Dans un premier temps, nous nous sommes focalisés sur la résolution de systèmes linéaires creux à l'aide des méthodes itératives CG et GMRES. Les expérimentations ont montré qu'une grappe de GPUs est plus performante que son homologue grappe de CPUs pour la résolution de systèmes linéaires de très grandes tailles. Ensuite, nous avons mis en oeuvre des algorithmes parallèles synchrones et asynchrones des méthodes itératives Richardson et de relaxation par blocs pour la résolution de systèmes non linéaires creux. Nous avons constaté que les meilleurs solutions développées pour les CPUs ne sont pas nécessairement bien adaptées aux GPUs. En effet, les simulations effectuées sur une grappe de GPUs ont montré que les algorithmes Richardson sont largement plus efficaces que ceux de relaxation par blocs. De plus, elles ont aussi montré que la puissance de calcul des GPUs permet de réduire le rapport entre le temps d'exécution et celui de communication, ce qui favorise l'utilisation des algorithmes asynchrones sur des grappes de GPUs. Enfin, nous nous sommes intéressés aux grappes géographiquement distantes pour la résolution de systèmes linéaires creux. Dans ce contexte, nous avons utilisé la méthode de multi-décomposition à deux niveaux avec GMRES parallèle adaptée aux grappes de GPUs. Celle-ci utilise des itérations synchrones pour résoudre localement les sous-systèmes linéaires et des itérations asynchrones pour résoudre la globalité du système linéaire.Or the past few years, the clusters equipped with GPUs have become attractive tools for high performance computing. In this thesis, we have designed parallel iterative algorithms for solving large sparse linear and nonlinear systems on GPU clusters. First, we have focused on solving sparse linear systems using CG and GMRES iterative methods. The experiments have shown that a GPU cluster is more efficient that its pure CPU counterpart for solving large sparse systems of linear equations. Then, we have implemented the synchronous and asynchronous algorithms of the Richardson and the block relaxation iterative methods for solving sparse nonlinear systems. We have noticed that the best solutions developed for the CPUs are not necessarily well suited to GPUs. Indeed, the experiments performed on a GPU cluster have shown that the parallel algorithms of the Richardson method are far more efficient than those of the block relaxation method. In addition, they have shown that the computing power of GPUs allows to reduce the ratio between the time of the computation over that of the communication, which favors the use of the asynchronous iteration on GPU clusters. Finally, we are interested in geographically distant clusters for solving large sparse linear systems. In this context, we have used a multisplitting two-stage method using parallel GMRES method adapted to GPU clusters. It uses the synchronous iteration to solve locally the sub-linear systems and the asynchronous one to solve the global sparse linear system

Theses.fr

Résolution de systèmes linéaires et non linéaires creux sur grappes de GPUs

Author: BAHI Jacques Mohcine
COUTURIER Raphaël
ZIANE KHODJA Lilia
Publication venue
Publication date: 01/01/2013
Field of study

OpenGrey Repository

Parallel sparse linear solver with GMRES method using minimization techniques of communications for GPU clusters

Author: Bahi Jacques
Couturier Raphael
Giersch Arnaud
Ziane Khodja Lilia
Publication venue: Springer Verlag
Publication date: 01/01/2014
Field of study

International audienceIn this paper, we aim at exploiting the power computing of a graphics processing unit (GPU) cluster for solving large sparse linear systems. We implement the parallel algorithm of the generalized minimal residual iterative method using the Compute Unified Device Architecture programming language and the MPI parallel environment. The experiments show that a GPU cluster is more efficient than a CPU cluster. In order to optimize the performances, we use a compressed storage format for the sparse vectors and the hypergraph partitioning. These solutions improve the spatial and temporal localization of the shared data between the computing nodes of the GPU cluster

Hal-Diderot

Solution of univalued and multivalued pseudo-linear problems using parallel asynchronous multisplitting methods combined with Krylov methods

Author: Couturier Raphael
Garcia Thierry
Spitéri Pierre
Ziane Khodja Lilia
Publication venue: 'Elsevier BV'
Publication date: 01/01/2021
Field of study

International audienceThe paper improves a preliminary experimental study on a cluster by adding both theoretical results and experimental tests on a grid platform. These algorithms solve univalued and multivalued pseudo-linear problems using parallel asynchronous multisplitting methods combined with Krylov’s methods. This paper also analyses these algorithms using contraction techniques. Two distinct applications, with discretized boundary value problems, are analyzed and simulated. First, a univalued convection-diffusion problem perturbed by an increasing diagonal operator is presented. Then, follows the description of a diffusion problem whose solution is constrained. This situation classically leads to the solution of a multivalued pseudo-linear problem in which the linear part is perturbed by an increasing diagonal multivalued operator. Parallel asynchronous and synchronous algorithms were implemented and tested on a grid platform composed of physically adjacent or geographically distant machines. In addition, the simulation results are detailed and show that the elapsed times obtained for the asynchronous algorithms are significantly less than those obtained for the synchronous algorithms

HAL - Université de Franche-Comté

Scientific Publications of the University of Toulouse II Le Mirail

Hal-Diderot

Parallel sparse linear solver with GMRES method using minimization techniques of communications for GPU clusters

Author: Bahi Jacques
Couturier Raphael
Giersch Arnaud
Ziane Khodja Lilia
Publication venue: HAL CCSD
Publication date: 01/01/2014
Field of study

HAL - Université de Franche-Comté

HAL Descartes

Hal-Diderot

Parallel solution of American option derivatives on GPU clusters

Author: Badea
Badea
Bahi
Barbu
Baudet
Bertsekas
Chazan
Giraud
Glowinski
Jacques Bahi
Jaillet
Kuznetsov
Kuznetsov
Leist
Li
Lilia Ziane Khodja
Lions
Miellou
Miellou
Miellou
Miellou
Miellou
Ming Chau
Pierre Spitéri
Raphaël Couturier
Spiteri
Tai
Tai
Tai
Varga
Wilmott
Zhao
Publication venue: 'Elsevier BV'
Publication date: 01/01/2013
Field of study

International audienceThis paper deals with the numerical solution of financial applications, more specifcally the computation of American option derivatives modelled by nonlinear boundary values problems. In such applications we have to solve largescale algebraic systems. We concentrate on synchronous and asynchronous parallel iterative algorithms carried out on CPU and GPU networks. The properties of the operators arising in the discretized problem ensure the convergence of the parallel iterative synchronous and asynchronous algorithms.Computational experiments performed on CPU and GPU networks are presentedand analyzed.Keywords: Parallel asynchronous algorithms, iterative parallel numericalmethods, subdomain method, sparse nonlinear systems, large scale obstacleproblems, finance, GPU clusters, CUD

HAL - Université de Franche-Comté

Crossref

Scientific Publications of the University of Toulouse II Le Mirail