9 research outputs found
Subdomain deflation combined with local AMG: a case study using AMGCL library
The final publication is available at Springer via http://dx.doi.org/10.1134/S1995080220040071The paper proposes a combination of the subdomain deflation method and local algebraic multigrid as a scalable distributed memory preconditioner that is able to solve large linear systems of equations. The implementation of the algorithm is made available for the community as part of an open source AMGCL library. The solution targets both homogeneous (CPU-only) and heterogeneous (CPU/GPU) systems, employing hybrid MPI/OpenMP approach in the former and a combination of MPI, OpenMP, and CUDA in the latter cases. The use of OpenMP minimizes the number of MPI processes, thus reducing the communication overhead of the deflation method and improving both weak and strong scalability of the preconditioner. The examples of scalar (single degree of freedom per grid node), Poisson-like, systems as well as non-scalar problems, stemming out of the discretization of the Navier-Stokes equations, are considered in order to estimate performance of the implemented algorithm. A comparison with a traditional global AMG preconditioner based on a well-established Trilinos ML package is provided.Contribution of Dr. Demidov was funded by the state assignment to the Joint Supercomputer Center of theRussian Academy of Sciences for Scientific Research and Russian Foundation for Basic Research, grant no. 18-07-00964. Dr. Rossi acknowledges the financial support to CIMNE via the CERCA Programme/Generalitat de Catalunya and the support of the ExaQUte FetHPC, project GA 800898. The authors thankfully acknowledge the support of the PRACE program (project 2010PA4058), in providing access to the MareNostrum 4 and PizDaint clusters. Without such resources the testing would not have been possible. The help of Prof. Labarta of the POP Center of Excellence in improving the NUMA scalability of the solver is also gratefully acknowledged.Peer ReviewedPostprint (author's final draft
Recommended from our members
On using Cholesky-based factorizations and regularization for solving rank-deficient sparse linear least-squares problems
By examining the performance of modern parallel sparse direct solvers and exploiting our knowledge of the algorithms behind them, we perform numerical experiments to study how they can be used to efficiently solve rank-deficient sparse linear least-squares problems arising from practical applications. The Cholesky factorization of the normal equations breaks down when the least-squares problem is rank-deficient, while applying a symmetric indefinite solver to the augmented system can give an unacceptable level of fill in the factors. To try to resolve these difficulties, we consider a regularization procedure that modifies the diagonal of the unregularized matrix. This leads to matrices that are easier to factorize. We consider both the regularized normal equations and the regularized augmented system. We employ the computed factors of the regularized systems as preconditioners with an iterative solver to obtain the solution of the original (unregularized) problem. Furthermore, we look at using limited-memory incomplete Cholesky-based factorizations and how these can offer the potential to solve very large problems
Recommended from our members
The state-of-the-art of preconditioners for sparse linear least-squares problems
In recent years, a variety of preconditioners have been proposed for use in solving large sparse linear least-squares problems. These include simple diagonal preconditioning, preconditioners based on incomplete factorizations and stationary inner iterations used with Krylov subspace methods. In this study, we briefly review preconditioners for which software has been made available and then present a numerical evaluation of them using performance profiles and a large set of problems arising from practical applications. Comparisons are made with state-of-the-art sparse direct methods
Recommended from our members
Strengths and limitations of stretching for least-squares problems with some dense rows
We recently introduced a sparse stretching strategy for handling dense rows that can arise in large-scale linear least-squares problems and make such problems challenging to solve. Sparse stretching is designed to limit the
amount of fill within the stretched normal matrix and hence within the subsequent Cholesky factorization. While preliminary results demonstrated that sparse stretching performs significantly better than standard stretching, it has a number of limitations. In this paper, we discuss and illustrate these limitations and propose new strategies that are designed to overcome them. Numerical experiments on problems arising from practical applications are used to demonstrate the effectiveness of these new ideas. We consider both direct and preconditioned iterative solvers
New Parallel Sparse Direct Solvers for Multicore Architectures
At the heart of many computations in science and engineering lies the need to efficiently and accurately solve large sparse linear systems of equations. Direct methods are frequently the method of choice because of their robustness, accuracy and potential for use as black-box solvers. In the last few years, there have been many new developments, and a number of new modern parallel general-purpose sparse solvers have been written for inclusion within the HSL mathematical software library. In this paper, we introduce and briefly review these solvers for symmetric sparse systems. We describe the algorithms used, highlight key features (including bit-compatibility and out-of-core working) and then, using problems arising from a range of practical applications, we illustrate and compare their performances. We demonstrate that modern direct solvers are able to accurately solve systems of order 106 in less than 3 minutes on a 16-core machine
New Parallel Sparse Direct Solvers for Multicore Architectures
At the heart of many computations in science and engineering lies the need to efficiently and accurately solve large sparse linear systems of equations. Direct methods are frequently the method of choice because of their robustness, accuracy and potential for use as black-box solvers. In the last few years, there have been many new developments, and a number of new modern parallel general-purpose sparse solvers have been written for inclusion within the HSL mathematical software library. In this paper, we introduce and briefly review these solvers for symmetric sparse systems. We describe the algorithms used, highlight key features (including bit-compatibility and out-of-core working) and then, using problems arising from a range of practical applications, we illustrate and compare their performances. We demonstrate that modern direct solvers are able to accurately solve systems of order 106 in less than 3 minutes on a 16-core machine