Search CORE

407 research outputs found

Recommended from our members

Preparing sparse solvers for exascale computing.

Author: Anzt Hartwig
Boman Erik
Curfman McInnes Lois
Falgout Rob
Ghysels Pieter
Heroux Michael
Li Xiaoye
Meier Yang Ulrike
Rajamanickam Sivasankaran
Rupp Karl
Smith Barry
Tran Mills Richard
Yamazaki Ichitaro
Publication venue: eScholarship, University of California
Publication date: 01/03/2020
Field of study

Sparse solvers provide essential functionality for a wide variety of scientific applications. Highly parallel sparse solvers are essential for continuing advances in high-fidelity, multi-physics and multi-scale simulations, especially as we target exascale platforms. This paper describes the challenges, strategies and progress of the US Department of Energy Exascale Computing project towards providing sparse solvers for exascale computing platforms. We address the demands of systems with thousands of high-performance node devices where exposing concurrency, hiding latency and creating alternative algorithms become essential. The efforts described here are works in progress, highlighting current success and upcoming challenges. This article is part of a discussion meeting issue 'Numerical algorithms for high-performance computational science'

eScholarship - University of California

Composing Scalable Nonlinear Algebraic Solvers

Author: Brune Peter R.
Knepley Matthew G.
Smith Barry F.
Tu Xuemin
Publication venue: 'Society for Industrial & Applied Mathematics (SIAM)'
Publication date: 16/02/2016
Field of study

Most efficient linear solvers use composable algorithmic components, with the most common model being the combination of a Krylov accelerator and one or more preconditioners. A similar set of concepts may be used for nonlinear algebraic systems, where nonlinear composition of different nonlinear solvers may significantly improve the time to solution. We describe the basic concepts of nonlinear composition and preconditioning and present a number of solvers applicable to nonlinear partial differential equations. We have developed a software framework in order to easily explore the possible combinations of solvers. We show that the performance gains from using composed solvers can be substantial compared with gains from standard Newton-Krylov methods.Comment: 29 pages, 14 figures, 13 table

arXiv.org e-Print Archive

CiteSeerX

KU ScholarWorks

Asynchronous Stabilisation and Assembly Techniques for Additive Multigrid

Author: MURRAY CHARLES,DAVID
Publication venue
Publication date: 01/01/2021
Field of study

Multigrid solvers are among the best solvers in the world, but once applied in the real world there are issues they must overcome. Many multigrid phases exhibit low concurrency. Mesh and matrix assembly are challenging to parallelise and introduce algorithmic latency. Dynamically adaptive codes exacerbate these issues. Multigrid codes require the computation of a cascade of matrices and dynamic adaptivity means these matrices are recomputed throughout the solve. Existing methods to compute the matrices are expensive and delay the solve. Non- trivial material parameters further increase the cost of accurate equation integration. We propose to assemble all matrix equations as stencils in a delayed element-wise fashion. Early multigrid iterations use cheap geometric approximations and more accurate updated stencil integrations are computed in parallel with the multigrid cycles. New stencil integrations are evaluated lazily and asynchronously fed to the solver once they become available. They do not delay multigrid iterations. We deploy stencil integrations as parallel tasks that are picked up by cores that would otherwise be idle. Coarse grid solves in multiplicative multigrid also exhibit limited concurrency. Small coarse mesh sizes correspond to small computational workload and require costly synchronisation steps. This acts as a bottleneck and delays solver iterations. Additive multigrid avoids this restriction, but becomes unstable for non-trivial material parameters as additive coarse grid levels tend to overcorrect. This leads to oscillations. We propose a new additive variant, adAFAC-x, with a stabilisation parameter that damps coarse grid corrections to remove oscillations. Per-level we solve an additional equation that produces an auxiliary correction. The auxiliary correction can be computed additively to the rest of the solve and uses ideas similar to smoothed aggregation multigrid to anticipate overcorrections. Pipelining techniques allow adAFAC-x to be written using single-touch semantics on a dynamically adaptive mesh

Durham e-Theses

Adaptive control in rollforward recovery for extreme scale multigrid

Author: Huber Markus
Rüde Ulrich
Wohlmuth Barbara
Publication venue
Publication date: 01/01/2018
Field of study

With the increasing number of compute components, failures in future exa-scale computer systems are expected to become more frequent. This motivates the study of novel resilience techniques. Here, we extend a recently proposed algorithm-based recovery method for multigrid iterations by introducing an adaptive control. After a fault, the healthy part of the system continues the iterative solution process, while the solution in the faulty domain is re-constructed by an asynchronous on-line recovery. The computations in both the faulty and healthy subdomains must be coordinated in a sensitive way, in particular, both under and over-solving must be avoided. Both of these waste computational resources and will therefore increase the overall time-to-solution. To control the local recovery and guarantee an optimal re-coupling, we introduce a stopping criterion based on a mathematical error estimator. It involves hierarchical weighted sums of residuals within the context of uniformly refined meshes and is well-suited in the context of parallel high-performance computing. The re-coupling process is steered by local contributions of the error estimator. We propose and compare two criteria which differ in their weights. Failure scenarios when solving up to

6.9\cdot10^{11}

unknowns on more than 245\,766 parallel processes will be reported on a state-of-the-art peta-scale supercomputer demonstrating the robustness of the method

arXiv.org e-Print Archive

Juelich Shared Electronic Resources