1,056 research outputs found
Multilevel communication optimal LU and QR factorizations for hierarchical platforms
This study focuses on the performance of two classical dense linear algebra
algorithms, the LU and the QR factorizations, on multilevel hierarchical
platforms. We first introduce a new model called Hierarchical Cluster Platform
(HCP), encapsulating the characteristics of such platforms. The focus is set on
reducing the communication requirements of studied algorithms at each level of
the hierarchy. Lower bounds on communications are therefore extended with
respect to the HCP model. We then introduce multilevel LU and QR algorithms
tailored for those platforms, and provide a detailed performance analysis. We
also provide a set of numerical experiments and performance predictions
demonstrating the need for such algorithms on large platforms
Recommended from our members
Preparing sparse solvers for exascale computing.
Sparse solvers provide essential functionality for a wide variety of scientific applications. Highly parallel sparse solvers are essential for continuing advances in high-fidelity, multi-physics and multi-scale simulations, especially as we target exascale platforms. This paper describes the challenges, strategies and progress of the US Department of Energy Exascale Computing project towards providing sparse solvers for exascale computing platforms. We address the demands of systems with thousands of high-performance node devices where exposing concurrency, hiding latency and creating alternative algorithms become essential. The efforts described here are works in progress, highlighting current success and upcoming challenges. This article is part of a discussion meeting issue 'Numerical algorithms for high-performance computational science'
An MPI-OpenMP Hybrid Parallel H
In this paper we propose a high performance parallel strategy/technique to implement the fast direct solver based on hierarchical matrices method. Our goal is to directly solve electromagnetic integral equations involving electric-large and geometrical-complex targets, which are traditionally difficult to be solved by iterative methods. The parallel method of our direct solver features both OpenMP shared memory programming and MPl message passing for running on a computer cluster. With modifications to the core direct-solving algorithm of hierarchical LU factorization, the new fast solver is scalable for parallelized implementation despite of its sequential nature. The numerical experiments demonstrate the accuracy and efficiency of the proposed parallel direct solver for analyzing electromagnetic scattering problems of complex 3D objects with nearly 4 million unknowns
Lecture 13: A low-rank factorization framework for building scalable algebraic solvers and preconditioners
Factorization based preconditioning algorithms, most notably incomplete LU (ILU) factorization, have been shown to be robust and applicable to wide ranges of problems. However, traditional ILU algorithms are not amenable to scalable implementation. In recent years, we have seen a lot of investigations using low-rank compression techniques to build approximate factorizations.A key to achieving lower complexity is the use of hierarchical matrix algebra, stemming from the H-matrix research. In addition, the multilevel algorithm paradigm provides a good vehicle for a scalable implementation. The goal of this lecture is to give an overview of the various hierarchical matrix formats, such as hierarchically semi-separable matrix (HSS), hierarchically off-diagonal low-rank matrix (HODLR) and Butterfly matrix, and explain the algorithm differences and approximation quality. We will illustrate many practical issues of these algorithms using our parallel libraries STRUMPACK and ButterflyPACK, and demonstrate their effectiveness and scalability while solving the very challenging problems, such as high frequency wave equations
Lecture 13: A low-rank factorization framework for building scalable algebraic solvers and preconditioners
Factorization based preconditioning algorithms, most notably incomplete LU (ILU) factorization, have been shown to be robust and applicable to wide ranges of problems. However, traditional ILU algorithms are not amenable to scalable implementation. In recent years, we have seen a lot of investigations using low-rank compression techniques to build approximate factorizations.A key to achieving lower complexity is the use of hierarchical matrix algebra, stemming from the H-matrix research. In addition, the multilevel algorithm paradigm provides a good vehicle for a scalable implementation. The goal of this lecture is to give an overview of the various hierarchical matrix formats, such as hierarchically semi-separable matrix (HSS), hierarchically off-diagonal low-rank matrix (HODLR) and Butterfly matrix, and explain the algorithm differences and approximation quality. We will illustrate many practical issues of these algorithms using our parallel libraries STRUMPACK and ButterflyPACK, and demonstrate their effectiveness and scalability while solving the very challenging problems, such as high frequency wave equations
Book of Abstracts of the Sixth SIAM Workshop on Combinatorial Scientific Computing
Book of Abstracts of CSC14 edited by Bora UçarInternational audienceThe Sixth SIAM Workshop on Combinatorial Scientific Computing, CSC14, was organized at the Ecole Normale Supérieure de Lyon, France on 21st to 23rd July, 2014. This two and a half day event marked the sixth in a series that started ten years ago in San Francisco, USA. The CSC14 Workshop's focus was on combinatorial mathematics and algorithms in high performance computing, broadly interpreted. The workshop featured three invited talks, 27 contributed talks and eight poster presentations. All three invited talks were focused on two interesting fields of research specifically: randomized algorithms for numerical linear algebra and network analysis. The contributed talks and the posters targeted modeling, analysis, bisection, clustering, and partitioning of graphs, applied in the context of networks, sparse matrix factorizations, iterative solvers, fast multi-pole methods, automatic differentiation, high-performance computing, and linear programming. The workshop was held at the premises of the LIP laboratory of ENS Lyon and was generously supported by the LABEX MILYON (ANR-10-LABX-0070, Université de Lyon, within the program ''Investissements d'Avenir'' ANR-11-IDEX-0007 operated by the French National Research Agency), and by SIAM
α-GMRES: A New Parallelizable AIterative Solver for Large Sparse Nonsymmetric Linear Systems Arising from CFD. G.U. Aero Report 9110
Linearization of the non-linear systems arising from fully implicit schemes in computational fluid dynamics often result in a large sparse non-symmetric linear system. Practical experience shows that these linear systems are ill-conditioned if a higher than first order spatial discretization scheme is used. To solve these linear systems, an efficient multilevel iterative method, the a-GMRES method, is proposed which incorporates a diagonal preconditioning with a damping factor a so that a balanced fast convergence of the inner GMRES iteration and the outer damping loop can be achieved. With this simple and efficient preconditioning and damping of the matrix, the resulting method can be effectively parallelized. The parallelization maintains the effectiveness of the original scheme due to the algorithm equivalence of the sequential and the parallel versions
A distributed-memory package for dense Hierarchically Semi-Separable matrix computations using randomization
We present a distributed-memory library for computations with dense
structured matrices. A matrix is considered structured if its off-diagonal
blocks can be approximated by a rank-deficient matrix with low numerical rank.
Here, we use Hierarchically Semi-Separable representations (HSS). Such matrices
appear in many applications, e.g., finite element methods, boundary element
methods, etc. Exploiting this structure allows for fast solution of linear
systems and/or fast computation of matrix-vector products, which are the two
main building blocks of matrix computations. The compression algorithm that we
use, that computes the HSS form of an input dense matrix, relies on randomized
sampling with a novel adaptive sampling mechanism. We discuss the
parallelization of this algorithm and also present the parallelization of
structured matrix-vector product, structured factorization and solution
routines. The efficiency of the approach is demonstrated on large problems from
different academic and industrial applications, on up to 8,000 cores.
This work is part of a more global effort, the STRUMPACK (STRUctured Matrices
PACKage) software package for computations with sparse and dense structured
matrices. Hence, although useful on their own right, the routines also
represent a step in the direction of a distributed-memory sparse solver
- …