133,457 research outputs found
A factored sparse approximate inverse preconditioned conjugate gradient solver on graphics processing units
Graphics Processing Units (GPUs) exhibit significantly higher peak performance than conventional CPUs. However, in general only highly parallel algorithms can exploit their potential. In this scenario, the iterative solution to sparse linear systems of equations could be carried out quite efficiently on a GPU as it requires only matrix-by-vector products, dot products, and vector updates. However, to be really effective, any iterative solver needs to be properly preconditioned and this represents a major bottleneck for a successful GPU implementation. Due to its inherent parallelism, the factored sparse approximate inverse (FSAI) preconditioner represents an optimal candidate for the conjugate gradient-like solution of sparse linear systems. However, its GPU implementation requires a nontrivial recasting of multiple computational steps. We present our GPU version of the FSAI preconditioner along with a set of results that show how a noticeable speedup with respect to a highly tuned CPU counterpart is obtained
On the Easy Use of Scientific Computing Services for Large Scale Linear Algebra and Parallel Decision Making with the P-Grade Portal
International audienceScientific research is becoming increasingly dependent on the large-scale analysis of data using distributed computing infrastructures (Grid, cloud, GPU, etc.). Scientific computing (Petitet et al. 1999) aims at constructing mathematical models and numerical solution techniques for solving problems arising in science and engineering. In this paper, we describe the services of an integrated portal based on the P-Grade (Parallel Grid Run-time and Application Development Environment) portal (http://www.p-grade.hu) that enables the solution of large-scale linear systems of equations using direct solvers, makes easier the use of parallel block iterative algorithm and provides an interface for parallel decision making algorithms. The ultimate goal is to develop a single sign on integrated multi-service environment providing an easy access to different kind of mathematical calculations and algorithms to be performed on hybrid distributed computing infrastructures combining the benefits of large clusters, Grid or cloud, when needed
An efficient GPU version of the preconditioned GMRES method
[EN] In a large number of scientific applications, the solution of sparse linear systems is the stage that concentrates most of the computational effort. This situation has motivated the study and development of several iterative solvers, among which preconditioned Krylov subspace methods occupy a place of privilege. In a previous effort, we developed a GPU-aware version of the GMRES method included in ILUPACK, a package of solvers distinguished by its inverse-based multilevel ILU preconditioner. In this work, we study the performance of our previous proposal and integrate several enhancements in order to mitigate its principal bottlenecks. The numerical evaluation shows that our novel proposal can reach important run-time reductions.Aliaga, JI.; Dufrechou, E.; Ezzatti, P.; Quintana-Orti, ES. (2019). An efficient GPU version of the preconditioned GMRES method. The Journal of Supercomputing. 75(3):1455-1469. https://doi.org/10.1007/s11227-018-2658-1S14551469753Aliaga JI, Badia RM, Barreda M, Bollhöfer M, Dufrechou E, Ezzatti P, Quintana-OrtĂ ES (2016) Exploiting task and data parallelism in ILUPACK’s preconditioned CG solver on NUMA architectures and many-core accelerators. Parallel Comput 54:97–107Aliaga JI, Bollhöfer M, Dufrechou E, Ezzatti P, Quintana-OrtĂ ES (2016) A data-parallel ILUPACK for sparse general and symmetric indefinite linear systems. In: Lecture Notes in Computer Science, 14th Int. Workshop on Algorithms, Models and Tools for Parallel Computing on Heterogeneous Platforms—HeteroPar’16. SpringerAliaga JI, Bollhöfer M, MartĂn AF, Quintana-OrtĂ ES (2011) Exploiting thread-level parallelism in the iterative solution of sparse linear systems. Parallel Comput 37(3):183–202Aliaga JI, Bollhöfer M, MartĂn AF, Quintana-OrtĂ ES (2012) Parallelization of multilevel ILU preconditioners on distributed-memory multiprocessors. Appl Parallel Sci Comput LNCS 7133:162–172Aliaga JI, Dufrechou E, Ezzatti P, Quintana-OrtĂ ES (2018) Accelerating a preconditioned GMRES method in massively parallel processors. In: CMMSE 2018: Proceedings of the 18th International Conference on Mathematical Methods in Science and Engineering (2018)Bollhöfer M, Grote MJ, Schenk O (2009) Algebraic multilevel preconditioner for the Helmholtz equation in heterogeneous media. SIAM J Sci Comput 31(5):3781–3805Bollhöfer M, Saad Y (2006) Multilevel preconditioners constructed from inverse-based ILUs. SIAM J Sci Comput 27(5):1627–1650Dufrechou E, Ezzatti P (2018) A new GPU algorithm to compute a level set-based analysis for the parallel solution of sparse triangular systems. In: 2018 IEEE International Parallel and Distributed Processing Symposium, IPDPS 2018, Canada, 2018. IEEE Computer SocietyDufrechou E, Ezzatti P (2018) Solving sparse triangular linear systems in modern GPUs: a synchronization-free algorithm. In: 2018 26th Euromicro International Conference on Parallel, Distributed and Network-Based Processing (PDP), pp 196–203. https://doi.org/10.1109/PDP2018.2018.00034Eijkhout V (1992) LAPACK working note 50: distributed sparse data structures for linear algebra operations. Tech. rep., Knoxville, TN, USAGolub GH, Van Loan CF (2013) Matrix computationsHe K, Tan SXD, Zhao H, Liu XX, Wang H, Shi G (2016) Parallel GMRES solver for fast analysis of large linear dynamic systems on GPU platforms. Integration 52:10–22 http://www.sciencedirect.com/science/article/pii/S016792601500084XLiu W, Li A, Hogg JD, Duff IS, Vinter B (2017) Fast synchronization-free algorithms for parallel sparse triangular solves with multiple right-hand sides. Concurr Comput 29(21)Saad Y (2003) Iterative methods for sparse linear systems, 2nd edn. SIAM, PhiladelphiaSchenk O, Wächter A, Weiser M (2008) Inertia revealing preconditioning for large-scale nonconvex constrained optimization. SIAM J Sci Comput 31(2):939–96
Graph partitioning using matrix values for preconditioning symmetric positive definite systems
Prior to the parallel solution of a large linear system, it is required to
perform a partitioning of its equations/unknowns. Standard partitioning
algorithms are designed using the considerations of the efficiency of the
parallel matrix-vector multiplication, and typically disregard the information
on the coefficients of the matrix. This information, however, may have a
significant impact on the quality of the preconditioning procedure used within
the chosen iterative scheme. In the present paper, we suggest a spectral
partitioning algorithm, which takes into account the information on the matrix
coefficients and constructs partitions with respect to the objective of
enhancing the quality of the nonoverlapping additive Schwarz (block Jacobi)
preconditioning for symmetric positive definite linear systems. For a set of
test problems with large variations in magnitudes of matrix coefficients, our
numerical experiments demonstrate a noticeable improvement in the convergence
of the resulting solution scheme when using the new partitioning approach
Extending substructure based iterative solvers to multiple load and repeated analyses
Direct solvers currently dominate commercial finite element structural software, but do not scale well in the fine granularity regime targeted by emerging parallel processors. Substructure based iterative solvers--often called also domain decomposition algorithms--lend themselves better to parallel processing, but must overcome several obstacles before earning their place in general purpose structural analysis programs. One such obstacle is the solution of systems with many or repeated right hand sides. Such systems arise, for example, in multiple load static analyses and in implicit linear dynamics computations. Direct solvers are well-suited for these problems because after the system matrix has been factored, the multiple or repeated solutions can be obtained through relatively inexpensive forward and backward substitutions. On the other hand, iterative solvers in general are ill-suited for these problems because they often must restart from scratch for every different right hand side. In this paper, we present a methodology for extending the range of applications of domain decomposition methods to problems with multiple or repeated right hand sides. Basically, we formulate the overall problem as a series of minimization problems over K-orthogonal and supplementary subspaces, and tailor the preconditioned conjugate gradient algorithm to solve them efficiently. The resulting solution method is scalable, whereas direct factorization schemes and forward and backward substitution algorithms are not. We illustrate the proposed methodology with the solution of static and dynamic structural problems, and highlight its potential to outperform forward and backward substitutions on parallel computers. As an example, we show that for a linear structural dynamics problem with 11640 degrees of freedom, every time-step beyond time-step 15 is solved in a single iteration and consumes 1.0 second on a 32 processor iPSC-860 system; for the same problem and the same parallel processor, a pair of forward/backward substitutions at each step consumes 15.0 seconds
Recommended from our members
Trilinos 4.0 tutorial.
The Trilinos Project is an effort to facilitate the design, development, integration and ongoing support of mathematical software libraries. The goal of the Trilinos Project is to develop parallel solver algorithms and libraries within an object-oriented software framework for the solution of large-scale, complex multiphysics engineering and scientific applications. The emphasis is on developing robust, scalable algorithms in a software framework, using abstract interfaces for flexible interoperability of components while providing a full-featured set of concrete classes that implement all the abstract interfaces. This document introduces the use of Trilinos, version 4.0. The presented material includes, among others, the definition of distributed matrices and vectors with Epetra, the iterative solution of linear systems with AztecOO, incomplete factorizations with IF-PACK, multilevel and domain decomposition preconditioners with ML, direct solution of linear system with Amesos, and iterative solution of nonlinear systems with NOX. The tutorial is a self-contained introduction, intended to help computational scientists effectively apply the appropriate Trilinos package to their applications. Basic examples are presented that are fit to be imitated. This document is a companion to the Trilinos User's Guide [20] and Trilinos Development Guides [21,22]. Please note that the documentation included in each of the Trilinos' packages is of fundamental importance
Partitioning, Ordering, and Load Balancing in a Hierarchically Parallel Hybrid Linear Solver
Institut National Polytechnique de Toulouse, RT-APO-12-2PDSLin is a general-purpose algebraic parallel hybrid (direct/iterative) linear solver based on the Schur complement method. The most challenging step of the solver is the computation of a preconditioner based on an approximate global Schur complement. We investigate two combinatorial problems to enhance PDSLin's performance at this step. The first is a multi-constraint partitioning problem to balance the workload while computing the preconditioner in parallel. For this, we describe and evaluate a number of graph and hypergraph partitioning algorithms to satisfy our particular objective and constraints. The second problem is to reorder the sparse right-hand side vectors to improve the data access locality during the parallel solution of a sparse triangular system with multiple right-hand sides. This is to speed up the process of eliminating the unknowns associated with the interface. We study two reordering techniques: one based on a postordering of the elimination tree and the other based on a hypergraph partitioning. To demonstrate the effect of these techniques on the performance of PDSLin, we present the numerical results of solving large-scale linear systems arising from two applications of our interest: numerical simulations of modeling accelerator cavities and of modeling fusion devices
- …