290 research outputs found
A Recursive Algebraic Coloring Technique for Hardware-Efficient Symmetric Sparse Matrix-Vector Multiplication
The symmetric sparse matrix-vector multiplication (SymmSpMV) is an important building block for many numerical linear algebra kernel operations or graph traversal applications. Parallelizing SymmSpMV on today's multicore platforms with up to 100 cores is difficult due to the need to manage conflicting updates on the result vector. Coloring approaches can be used to solve this problem without data duplication, but existing coloring algorithms do not take load balancing and deep memory hierarchies into account, hampering scalability and full-chip performance. In this work, we propose the recursive algebraic coloring engine (RACE), a novel coloring algorithm and open-source library implementation, which eliminates the shortcomings of previous coloring methods in terms of hardware efficiency and parallelization overhead. We describe the level construction, distance-k coloring, and load balancing steps in RACE, use it to parallelize SymmSpMV, and compare its performance on 31 sparse matrices with other state-of-the-art coloring techniques and Intel MKL on two modern multicore processors. RACE outperforms all other approaches substantially and behaves in accordance with the Roofline model. Outliers are discussed and analyzed in detail. While we focus on SymmSpMV in this paper, our algorithm and software is applicable to any sparse matrix operation with data dependencies that can be resolved by distance-k coloring
Book of Abstracts of the Sixth SIAM Workshop on Combinatorial Scientific Computing
Book of Abstracts of CSC14 edited by Bora UçarInternational audienceThe Sixth SIAM Workshop on Combinatorial Scientific Computing, CSC14, was organized at the Ecole Normale Supérieure de Lyon, France on 21st to 23rd July, 2014. This two and a half day event marked the sixth in a series that started ten years ago in San Francisco, USA. The CSC14 Workshop's focus was on combinatorial mathematics and algorithms in high performance computing, broadly interpreted. The workshop featured three invited talks, 27 contributed talks and eight poster presentations. All three invited talks were focused on two interesting fields of research specifically: randomized algorithms for numerical linear algebra and network analysis. The contributed talks and the posters targeted modeling, analysis, bisection, clustering, and partitioning of graphs, applied in the context of networks, sparse matrix factorizations, iterative solvers, fast multi-pole methods, automatic differentiation, high-performance computing, and linear programming. The workshop was held at the premises of the LIP laboratory of ENS Lyon and was generously supported by the LABEX MILYON (ANR-10-LABX-0070, Université de Lyon, within the program ''Investissements d'Avenir'' ANR-11-IDEX-0007 operated by the French National Research Agency), and by SIAM
Parallel Smoothers for Matrix-based Multigrid Methods on Unstructured Meshes Using Multicore CPUs and GPUs
Multigrid methods are efficient and fast solvers for problems typically modeled by partial differential equations of elliptic type. For problems with complex geometries and local singularities stencil-type discrete operators on equidistant Cartesian grids need to be replaced by more flexible concepts for unstructured meshes in order to properly resolve all problem-inherent specifics and for maintaining a moderate number of unknowns. However, flexibility in the meshes goes along with severe drawbacks with respect to parallel execution – especially with respect to the definition of adequate smoothers. This point becomes in particular pronounced in the framework of fine-grained parallelism on GPUs with hundreds of execution units. We use the approach of matrixbased multigrid that has high flexibility and adapts well to the exigences of modern computing platforms. In this work we investigate multi-colored Gauß-Seidel type smoothers, the power(q)-pattern enhanced multi-colored ILU(p) smoothers with fillins
Recommended from our members
Preparing sparse solvers for exascale computing.
Sparse solvers provide essential functionality for a wide variety of scientific applications. Highly parallel sparse solvers are essential for continuing advances in high-fidelity, multi-physics and multi-scale simulations, especially as we target exascale platforms. This paper describes the challenges, strategies and progress of the US Department of Energy Exascale Computing project towards providing sparse solvers for exascale computing platforms. We address the demands of systems with thousands of high-performance node devices where exposing concurrency, hiding latency and creating alternative algorithms become essential. The efforts described here are works in progress, highlighting current success and upcoming challenges. This article is part of a discussion meeting issue 'Numerical algorithms for high-performance computational science'
Algebraic Temporal Blocking for Sparse Iterative Solvers on Multi-Core CPUs
Sparse linear iterative solvers are essential for many large-scale
simulations. Much of the runtime of these solvers is often spent in the
implicit evaluation of matrix polynomials via a sequence of sparse
matrix-vector products. A variety of approaches has been proposed to make these
polynomial evaluations explicit (i.e., fix the coefficients), e.g., polynomial
preconditioners or s-step Krylov methods. Furthermore, it is nowadays a popular
practice to approximate triangular solves by a matrix polynomial to increase
parallelism. Such algorithms allow to evaluate the polynomial using a so-called
matrix power kernel (MPK), which computes the product between a power of a
sparse matrix A and a dense vector x, or a related operation. Recently we have
shown that using the level-based formulation of sparse matrix-vector
multiplications in the Recursive Algebraic Coloring Engine (RACE) framework we
can perform temporal cache blocking of MPK to increase its performance. In this
work, we demonstrate the application of this cache-blocking optimization in
sparse iterative solvers.
By integrating the RACE library into the Trilinos framework, we demonstrate
the speedups achieved in preconditioned) s-step GMRES, polynomial
preconditioners, and algebraic multigrid (AMG). For MPK-dominated algorithms we
achieve speedups of up to 3x on modern multi-core compute nodes. For algorithms
with moderate contributions from subspace orthogonalization, the gain reduces
significantly, which is often caused by the insufficient quality of the
orthogonalization routines. Finally, we showcase the application of
RACE-accelerated solvers in a real-world wind turbine simulation (Nalu-Wind)
and highlight the new opportunities and perspectives opened up by RACE as a
cache-blocking technique for MPK-enabled sparse solvers.Comment: 25 pages, 11 figures, 3 table
A bibliography on parallel and vector numerical algorithms
This is a bibliography of numerical methods. It also includes a number of other references on machine architecture, programming language, and other topics of interest to scientific computing. Certain conference proceedings and anthologies which have been published in book form are listed also
Solution of partial differential equations on vector and parallel computers
The present status of numerical methods for partial differential equations on vector and parallel computers was reviewed. The relevant aspects of these computers are discussed and a brief review of their development is included, with particular attention paid to those characteristics that influence algorithm selection. Both direct and iterative methods are given for elliptic equations as well as explicit and implicit methods for initial boundary value problems. The intent is to point out attractive methods as well as areas where this class of computer architecture cannot be fully utilized because of either hardware restrictions or the lack of adequate algorithms. Application areas utilizing these computers are briefly discussed
- …