253 research outputs found
Recommended from our members
Preparing sparse solvers for exascale computing.
Sparse solvers provide essential functionality for a wide variety of scientific applications. Highly parallel sparse solvers are essential for continuing advances in high-fidelity, multi-physics and multi-scale simulations, especially as we target exascale platforms. This paper describes the challenges, strategies and progress of the US Department of Energy Exascale Computing project towards providing sparse solvers for exascale computing platforms. We address the demands of systems with thousands of high-performance node devices where exposing concurrency, hiding latency and creating alternative algorithms become essential. The efforts described here are works in progress, highlighting current success and upcoming challenges. This article is part of a discussion meeting issue 'Numerical algorithms for high-performance computational science'
Parallel Smoothers for Matrix-based Multigrid Methods on Unstructured Meshes Using Multicore CPUs and GPUs
Multigrid methods are efficient and fast solvers for problems typically modeled by partial differential equations of elliptic type. For problems with complex geometries and local singularities stencil-type discrete operators on equidistant Cartesian grids need to be replaced by more flexible concepts for unstructured meshes in order to properly resolve all problem-inherent specifics and for maintaining a moderate number of unknowns. However, flexibility in the meshes goes along with severe drawbacks with respect to parallel execution – especially with respect to the definition of adequate smoothers. This point becomes in particular pronounced in the framework of fine-grained parallelism on GPUs with hundreds of execution units. We use the approach of matrixbased multigrid that has high flexibility and adapts well to the exigences of modern computing platforms. In this work we investigate multi-colored Gauß-Seidel type smoothers, the power(q)-pattern enhanced multi-colored ILU(p) smoothers with fillins
An Experimental Study of Two-Level Schwarz Domain Decomposition Preconditioners on GPUs
The generalized Dryja--Smith--Widlund (GDSW) preconditioner is a two-level
overlapping Schwarz domain decomposition (DD) preconditioner that couples a
classical one-level overlapping Schwarz preconditioner with an
energy-minimizing coarse space. When used to accelerate the convergence rate of
Krylov subspace iterative methods, the GDSW preconditioner provides robustness
and scalability for the solution of sparse linear systems arising from the
discretization of a wide range of partial different equations. In this paper,
we present FROSch (Fast and Robust Schwarz), a domain decomposition solver
package which implements GDSW-type preconditioners for both CPU and GPU
clusters. To improve the solver performance on GPUs, we use a novel
decomposition to run multiple MPI processes on each GPU, reducing both solver's
computational and storage costs and potentially improving the convergence rate.
This allowed us to obtain competitive or faster performance using GPUs compared
to using CPUs alone. We demonstrate the performance of FROSch on the Summit
supercomputer with NVIDIA V100 GPUs, where we used NVIDIA Multi-Process Service
(MPS) to implement our decomposition strategy.
The solver has a wide variety of algorithmic and implementation choices,
which poses both opportunities and challenges for its GPU implementation. We
conduct a thorough experimental study with different solver options including
the exact or inexact solution of the local overlapping subdomain problems on a
GPU. We also discuss the effect of using the iterative variant of the
incomplete LU factorization and sparse-triangular solve as the approximate
local solver, and using lower precision for computing the whole FROSch
preconditioner. Overall, the solve time was reduced by factors of about
using GPUs, while the GPU acceleration of the numerical setup time
depend on the solver options and the local matrix sizes.Comment: Accepted for publication in IPDPS'2
Efficient algebraic multigrid preconditioners on clusters of GPUs
Many scientific applications require the solution of large and sparse linear systems of equations using Krylov subspace methods; in this case, the choice of an effective preconditioner may be crucial for the convergence of the Krylov solver. Algebraic MultiGrid (AMG) methods are widely used as preconditioners, because of their optimal computational cost and their algorithmic scalability. The wide availability of GPUs, now found in many of the fastest supercomputers, poses the problem of implementing efficiently these methods on high-throughput processors. In this work we focus on the application phase of AMG preconditioners, and in particular on the choice and implementation of smoothers and coarsest-level solvers capable of exploiting the computational power of clusters of GPUs. We consider block-Jacobi smoothers using sparse approximate inverses in the solve phase associated with the local blocks. The choice of approximate inverses instead of sparse matrix factorizations is driven by the large amount of parallelism exposed by the matrix-vector product as compared to the solution of large triangular systems on GPUs. The selected smoothers and solvers are implemented within the AMG preconditioning framework provided by the MLD2P4 library, using suitable sparse matrix data structures from the PSBLAS library. Their behaviour is illustrated in terms of execution speed and scalability, on a test case concerning groundwater modelling, provided by the JĂĽlich Supercomputing Center within the Horizon 2020 Project EoCoE
- …