4 research outputs found

    ParILUT - A parallel threshold ILU for GPUS

    Get PDF

    Providing performance portable numerics for Intel GPUs

    Get PDF
    With discrete Intel GPUs entering the high-performance computing landscape, there is an urgent need for production-ready software stacks for these platforms. In this article, we report how we enable the Ginkgo math library to execute on Intel GPUs by developing a kernel backed based on the DPC++ programming environment. We discuss conceptual differences between the CUDA and DPC++ programming models and describe workflows for simplified code conversion. We evaluate the performance of basic and advanced sparse linear algebra routines available in Ginkgo\u27s DPC++ backend in the hardware-specific performance bounds and compare against routines providing the same functionality that ship with Intel\u27s oneMKL vendor library

    ILU Smoothers for AMG with Scaled Triangular Factors

    Full text link
    ILU smoothers are effective in the algebraic multigrid (AMG) V-cycle for reducing high-frequency components of the residual error. However, direct triangular solves are comparatively slow on GPUs. Previous work by Chow and Patel (2015) and Antz et al. (2015) demonstrated the advantages of Jacobi relaxation as an alternative. Depending on the threshold and fill-level parameters chosen, the factors are highly non-normal and Jacobi is unlikely to converge in a low number of iterations. The Ruiz algorithm applies row or row/column scaling to U in order to reduce the departure from normality. The inherently sequential solve is replaced with a Richardson iteration. There are several advantages beyond the lower compute time. Scaling is performed locally for a diagonal block of the global matrix because it is applied directly to the factor. An ILUT Schur complement smoother maintains a constant GMRES iteration count as the number of MPI ranks increases and thus parallel strong-scaling is improved. The new algorithms are included in hypre, and achieve improved time to solution for several Exascale applications, including the Nalu-Wind and PeleLM pressure solvers. For large problem sizes, GMRES+AMG with iterative triangular solves execute at least five times faster than with direct on massively-parallel GPUs.Comment: v2 updated citation information; v3 updated results; v4 abstract updated, new results added; v5 new experimental analysis and results adde
    corecore