442 research outputs found

    Sparse approximate inverse preconditioners on high performance GPU platforms

    Get PDF
    Simulation with models based on partial differential equations often requires the solution of (sequences of) large and sparse algebraic linear systems. In multidimensional domains, preconditioned Krylov iterative solvers are often appropriate for these duties. Therefore, the search for efficient preconditioners for Krylov subspace methods is a crucial theme. Recent developments, especially in computing hardware, have renewed the interest in approximate inverse preconditioners in factorized form, because their application during the solution process can be more efficient. We present here some experiences focused on the approximate inverse preconditioners proposed by Benzi and TĹŻma from 1996 and the sparsification and inversion proposed by van Duin in 1999. Computational costs, reorderings and implementation issues are considered both on conventional and innovative computing architectures like Graphics Programming Units (GPUs)

    Preconditioning for Sparse Linear Systems at the Dawn of the 21st Century: History, Current Developments, and Future Perspectives

    Get PDF
    Iterative methods are currently the solvers of choice for large sparse linear systems of equations. However, it is well known that the key factor for accelerating, or even allowing for, convergence is the preconditioner. The research on preconditioning techniques has characterized the last two decades. Nowadays, there are a number of different options to be considered when choosing the most appropriate preconditioner for the specific problem at hand. The present work provides an overview of the most popular algorithms available today, emphasizing the respective merits and limitations. The overview is restricted to algebraic preconditioners, that is, general-purpose algorithms requiring the knowledge of the system matrix only, independently of the specific problem it arises from. Along with the traditional distinction between incomplete factorizations and approximate inverses, the most recent developments are considered, including the scalable multigrid and parallel approaches which represent the current frontier of research. A separate section devoted to saddle-point problems, which arise in many different applications, closes the paper

    Subdomain deflation combined with local AMG: a case study using AMGCL library

    Get PDF
    The final publication is available at Springer via http://dx.doi.org/10.1134/S1995080220040071The paper proposes a combination of the subdomain deflation method and local algebraic multigrid as a scalable distributed memory preconditioner that is able to solve large linear systems of equations. The implementation of the algorithm is made available for the community as part of an open source AMGCL library. The solution targets both homogeneous (CPU-only) and heterogeneous (CPU/GPU) systems, employing hybrid MPI/OpenMP approach in the former and a combination of MPI, OpenMP, and CUDA in the latter cases. The use of OpenMP minimizes the number of MPI processes, thus reducing the communication overhead of the deflation method and improving both weak and strong scalability of the preconditioner. The examples of scalar (single degree of freedom per grid node), Poisson-like, systems as well as non-scalar problems, stemming out of the discretization of the Navier-Stokes equations, are considered in order to estimate performance of the implemented algorithm. A comparison with a traditional global AMG preconditioner based on a well-established Trilinos ML package is provided.Contribution of Dr. Demidov was funded by the state assignment to the Joint Supercomputer Center of theRussian Academy of Sciences for Scientific Research and Russian Foundation for Basic Research, grant no. 18-07-00964. Dr. Rossi acknowledges the financial support to CIMNE via the CERCA Programme/Generalitat de Catalunya and the support of the ExaQUte FetHPC, project GA 800898. The authors thankfully acknowledge the support of the PRACE program (project 2010PA4058), in providing access to the MareNostrum 4 and PizDaint clusters. Without such resources the testing would not have been possible. The help of Prof. Labarta of the POP Center of Excellence in improving the NUMA scalability of the solver is also gratefully acknowledged.Peer ReviewedPostprint (author's final draft

    An Experimental Study of Two-Level Schwarz Domain Decomposition Preconditioners on GPUs

    Full text link
    The generalized Dryja--Smith--Widlund (GDSW) preconditioner is a two-level overlapping Schwarz domain decomposition (DD) preconditioner that couples a classical one-level overlapping Schwarz preconditioner with an energy-minimizing coarse space. When used to accelerate the convergence rate of Krylov subspace iterative methods, the GDSW preconditioner provides robustness and scalability for the solution of sparse linear systems arising from the discretization of a wide range of partial different equations. In this paper, we present FROSch (Fast and Robust Schwarz), a domain decomposition solver package which implements GDSW-type preconditioners for both CPU and GPU clusters. To improve the solver performance on GPUs, we use a novel decomposition to run multiple MPI processes on each GPU, reducing both solver's computational and storage costs and potentially improving the convergence rate. This allowed us to obtain competitive or faster performance using GPUs compared to using CPUs alone. We demonstrate the performance of FROSch on the Summit supercomputer with NVIDIA V100 GPUs, where we used NVIDIA Multi-Process Service (MPS) to implement our decomposition strategy. The solver has a wide variety of algorithmic and implementation choices, which poses both opportunities and challenges for its GPU implementation. We conduct a thorough experimental study with different solver options including the exact or inexact solution of the local overlapping subdomain problems on a GPU. We also discuss the effect of using the iterative variant of the incomplete LU factorization and sparse-triangular solve as the approximate local solver, and using lower precision for computing the whole FROSch preconditioner. Overall, the solve time was reduced by factors of about 2Ă—2\times using GPUs, while the GPU acceleration of the numerical setup time depend on the solver options and the local matrix sizes.Comment: Accepted for publication in IPDPS'2

    Preconditioners based on the Alternating-Direction-Implicit algorithm for the 2D steady-state diffusion equation with orthotropic heterogeneous coefficients

    Get PDF
    In this paper, we combine the Alternating Direction Implicit (ADI) algorithm with the concept of preconditioning and apply it to linear systems discretized from the 2D steady-state diffusion equations with orthotropic heterogeneous coefficients by the finite element method assuming tensor product basis functions. Specifically, we adopt the compound iteration idea and use ADI iterations as the preconditioner for the outside Krylov subspace method that is used to solve the preconditioned linear system. An efficient algorithm to perform each ADI iteration is crucial to the efficiency of the overall iterative scheme. We exploit the Kronecker product structure in the matrices, inherited from the tensor product basis functions, to achieve high efficiency in each ADI iteration. Meanwhile, in order to reduce the number of Krylov subspace iterations, we incorporate partially the coefficient information into the preconditioner by exploiting the local support property of the finite element basis functions. Numerical results demonstrated the efficiency and quality of the proposed preconditioner. © 2014 Elsevier B.V. All rights reserved
    • …
    corecore