4 research outputs found
Recommended from our members
Preparing sparse solvers for exascale computing.
Sparse solvers provide essential functionality for a wide variety of scientific applications. Highly parallel sparse solvers are essential for continuing advances in high-fidelity, multi-physics and multi-scale simulations, especially as we target exascale platforms. This paper describes the challenges, strategies and progress of the US Department of Energy Exascale Computing project towards providing sparse solvers for exascale computing platforms. We address the demands of systems with thousands of high-performance node devices where exposing concurrency, hiding latency and creating alternative algorithms become essential. The efforts described here are works in progress, highlighting current success and upcoming challenges. This article is part of a discussion meeting issue 'Numerical algorithms for high-performance computational science'
Providing performance portable numerics for Intel GPUs
With discrete Intel GPUs entering the high-performance computing landscape, there is an urgent need for production-ready software stacks for these platforms. In this article, we report how we enable the Ginkgo math library to execute on Intel GPUs by developing a kernel backed based on the DPC++ programming environment. We discuss conceptual differences between the CUDA and DPC++ programming models and describe workflows for simplified code conversion. We evaluate the performance of basic and advanced sparse linear algebra routines available in Ginkgo\u27s DPC++ backend in the hardware-specific performance bounds and compare against routines providing the same functionality that ship with Intel\u27s oneMKL vendor library
ILU Smoothers for AMG with Scaled Triangular Factors
ILU smoothers are effective in the algebraic multigrid (AMG) V-cycle for
reducing high-frequency components of the residual error. However, direct
triangular solves are comparatively slow on GPUs. Previous work by Chow and
Patel (2015) and Antz et al. (2015) demonstrated the advantages of Jacobi
relaxation as an alternative. Depending on the threshold and fill-level
parameters chosen, the factors are highly non-normal and Jacobi is unlikely to
converge in a low number of iterations. The Ruiz algorithm applies row or
row/column scaling to U in order to reduce the departure from normality. The
inherently sequential solve is replaced with a Richardson iteration. There are
several advantages beyond the lower compute time. Scaling is performed locally
for a diagonal block of the global matrix because it is applied directly to the
factor. An ILUT Schur complement smoother maintains a constant GMRES iteration
count as the number of MPI ranks increases and thus parallel strong-scaling is
improved. The new algorithms are included in hypre, and achieve improved time
to solution for several Exascale applications, including the Nalu-Wind and
PeleLM pressure solvers. For large problem sizes, GMRES+AMG with iterative
triangular solves execute at least five times faster than with direct on
massively-parallel GPUs.Comment: v2 updated citation information; v3 updated results; v4 abstract
updated, new results added; v5 new experimental analysis and results adde