Search CORE

32 research outputs found

Towards exascale with the ANR-JST japanese-french project FP3C

Author: C Calvin
G Antoniu
H Nakashima
K Nakajima
M Daydé
M Sato
N Emad
P Codognet
R Namyst
S Matsuoka
S Petiton
T Boku
T Sakurai
Y Ishikawa
Publication venue
Publication date: 24/04/2020
Field of study

Algebraic Temporal Blocking for Sparse Iterative Solvers on Multi-Core CPUs

Author: Alappat Christie
Fehske Holger
Hager Georg
Thies Jonas
Wellein Gerhard
Publication venue
Publication date: 05/09/2023
Field of study

Sparse linear iterative solvers are essential for many large-scale simulations. Much of the runtime of these solvers is often spent in the implicit evaluation of matrix polynomials via a sequence of sparse matrix-vector products. A variety of approaches has been proposed to make these polynomial evaluations explicit (i.e., fix the coefficients), e.g., polynomial preconditioners or s-step Krylov methods. Furthermore, it is nowadays a popular practice to approximate triangular solves by a matrix polynomial to increase parallelism. Such algorithms allow to evaluate the polynomial using a so-called matrix power kernel (MPK), which computes the product between a power of a sparse matrix A and a dense vector x, or a related operation. Recently we have shown that using the level-based formulation of sparse matrix-vector multiplications in the Recursive Algebraic Coloring Engine (RACE) framework we can perform temporal cache blocking of MPK to increase its performance. In this work, we demonstrate the application of this cache-blocking optimization in sparse iterative solvers. By integrating the RACE library into the Trilinos framework, we demonstrate the speedups achieved in preconditioned) s-step GMRES, polynomial preconditioners, and algebraic multigrid (AMG). For MPK-dominated algorithms we achieve speedups of up to 3x on modern multi-core compute nodes. For algorithms with moderate contributions from subspace orthogonalization, the gain reduces significantly, which is often caused by the insufficient quality of the orthogonalization routines. Finally, we showcase the application of RACE-accelerated solvers in a real-world wind turbine simulation (Nalu-Wind) and highlight the new opportunities and perspectives opened up by RACE as a cache-blocking technique for MPK-enabled sparse solvers.Comment: 25 pages, 11 figures, 3 table

arXiv.org e-Print Archive

VBARMS: A variable block algebraic recursive multilevel solver for sparse linear systems

Author: Liao Jia
Publication venue: 'University of Groningen Press'
Publication date: 01/01/2015
Field of study

ARTS repository - University of Groningen

VBARMS: A variable block algebraic recursive multilevel solver for sparse linear systems

Author: Liao Jia
Publication venue: 'University of Groningen Press'
Publication date: 01/01/2015
Field of study

Dissertations of the University of Groningen

VBARMS: A variable block algebraic recursive multilevel solver for sparse linear systems

Author: Liao Jia
Publication venue: University of Groningen
Publication date: 01/01/2015
Field of study

Sparse matrices arising from the solution of systems of partial differential equations often exhibit a perfect block structure. It means that the nonzero blocks in the sparsity pattern are fully dense (and typically small), e.g., when several unknown quantities are associated with the same grid point. However, similar block orderings can be sometimes found also on general unstructured matrices by ordering consecutively rows and columns with a similar sparsity pattern. We also can treat some zero entries of the reordered matrix as nonzero elements to enlarge the blocks to improve the performance. The reordering results in linear systems with blocks of variable size in general. Our recently developed parallel package pVBARMS (parallel variable block algebraic recursive multilevel solver) for distributed memory computers takes advantage of these frequently occurring structures in the design of the multilevel incomplete LU factorization preconditioner. It maximizes computational efficiency and achieves increased throughput during the computation and improved reliability on realistic applications. The method detects automatically any existing block structure in the matrix without any users prior knowledge of the underlying problem, and exploits it to maximize computational efficiency. We proposed a study of performance comparison of pVBAMRS and other popular solvers on a set of general linear systems arising from different application field. We also report on the numerical and parallel scalability of the pVBARMS package for solving the turbulent, Reynolds-averaged, Navier-Stokes (RANS) equations

Proceedings - University of Groningen

University of Groningen

ARTS repository - University of Groningen

Dissertations of the University of Groningen

On the strong scalability of maritime CFD

Author: A. B. Phillips
C Klaij
G Pringle
G Rosetti
G. Vaz
J. Hawkes
LN Olson
M Gee
P Ghysels
R Dennard
RB Lehoucq
S Bhushan
S Browne
S Lee
S. J. Cox
S. R. Turnock
W Gropp
Publication venue: 'Springer Science and Business Media LLC'
Publication date
Field of study

Crossref

Analysis and massively parallel implementation of the 2-Lagrange multiplier methods and optimized Schwarz methods

Author: Karangelis Anastasios
Publication venue: Mathematical and Computer Sciences
Publication date: 01/04/2016
Field of study

Engineering and Physical Sciences Research Council (EPSRC) grant EP/G036136/1

ROS: The Research Output Service. Heriot-Watt University Edinburgh

Iterative solution of linear systems with improved arithmetic and result verification [online]

Author: Facius Axel
Publication venue: KIT-Bibliothek, Karlsruhe
Publication date: 01/01/2000
Field of study

KITopen

Optimization and validation of discontinuous Galerkin Code for the 3D Navier-Stokes equations

Author: Liu Eric Hung-Lin
Publication venue: Massachusetts Institute of Technology
Publication date: 01/01/2011
Field of study

Thesis (S.M.)--Massachusetts Institute of Technology, Dept. of Aeronautics and Astronautics, 2011.This electronic version was submitted by the student author. The certified thesis is available in the Institute Archives and Special Collections.Cataloged from student submitted PDF version of thesis.Includes bibliographical references (p. 165-170).From residual and Jacobian assembly to the linear solve, the components of a high-order, Discontinuous Galerkin Finite Element Method (DGFEM) for the Navier-Stokes equations in 3D are presented. Emphasis is given to residual and Jacobian assembly, since these are rarely discussed in the literature; in particular, this thesis focuses on code optimization. Performance properties of DG methods are identified, including key memory bottlenecks. A detailed overview of the memory hierarchy on modern CPUs is given along with discussion on optimization suggestions for utilizing the hierarchy efficiently. Other programming suggestions are also given, including the process for rewriting residual and Jacobian assembly using matrix-matrix products. Finally, a validation of the performance of the 3D, viscous DG solver is presented through a series of canonical test cases.by Eric Hung-Lin Liu.S.M

DSpace@MIT

Recommended from our members

PETSc 2.0 Users Manual: Revision 2.0.16

Author: Balay S.
Gropp W.
McInnes L.C.
Smith B.
Publication venue: 'Office of Scientific and Technical Information (OSTI)'
Publication date: 01/02/1997
Field of study

This manual describes the use of PETSc 2.0 for the numerical solution of partial differential equations and related problems on high-performance computers. The Portable, Extensible Toolkit for Scientific Computation (PETSc) is a suite of data structures and routines that provide the building blocks for the implementation of large-scale application codes on parallel (and serial) computers. PETSc 2.0 uses the MPI standard for all message-passing communication. PETSc includes an expanding suite of parallel linear and nonlinear equation solvers that may be used in application codes written in Fortran, C, and C++. PETSc provides many of the mechanisms needed thin parallel application codes, such as simple parallel matrix and vector assembly routines that allow the overlap of communication and computation. In addition, PETSc includes growing support for distributed arrays. The library is organized hierarchically, enabling users to employ the level of abstraction that is most appropriate for a particular problem. By using techniques of object-oriented programming, PETSc provides enormous flexibility for users. PETSc is a sophisticated set of software tools; as such, for some users it initially has a much steeper learning curve than a simple subroutine library. In particular, for individuals without some computer science background or experience programming in C, Pascal, or C++, it may require a large amount of time to take full advantage of the features that enable efficient software use. However, the power of the PETSc design and the algorithms it incorporates make the efficient implementation of many application codes much simpler than rolling them yourself. For many simple tasks a package such as Matlab is often the best tool; PETSc is not intended for the classes of problems for which effective Matlab code can be written. Since PETSc is still under development, small changes in usage and calling sequences of PETSc routines will continue to occur

UNT Digital Library