264 research outputs found
The deal.II Library, Version 9.1
This paper provides an overview of the new features of the finite element library deal.II, version 9.1
A Parallel Geometric Multigrid Method for Adaptive Finite Elements
Applications in a variety of scientific disciplines use systems of Partial Differential Equations (PDEs) to model physical phenomena. Numerical solutions to these models are often found using the Finite Element Method (FEM), where the problem is discretized and the solution of a large linear system is required, containing millions or even billions of unknowns. Often times, the domain of these solves will contain localized features that require very high resolution of the underlying finite element mesh to accurately solve, while a mesh with uniform resolution would require far too much computational time and memory overhead to be feasible on a modern machine. Therefore, techniques like adaptive mesh refinement, where one increases the resolution of the mesh only where it is necessary, must be used. Even with adaptive mesh refinement, these systems can still be on the order of much more than a million unknowns (large mantle convection applications like the ones in [90] show simulations on over 600 billion unknowns), and attempting to solve on a single processing unit is infeasible due to limited computational time and memory required. For this reason, any application code aimed at solving large problems must be built using a parallel framework, allowing the concurrent use of multiple processing units to solve a single problem, and the code must exhibit efficient scaling to large amounts of processing units.
Multigrid methods are currently the only known optimal solvers for linear systems arising from discretizations of elliptic boundary valued problems. These methods can be represented as an iterative scheme with contraction number less than one, independent of the resolution of the discretization [24, 54, 25, 103], with optimal complexity in the number of unknowns in the system [29]. Geometric multigrid (GMG) methods, where the hierarchy of spaces are defined by linear systems of finite element discretizations on meshes of decreasing resolution, have been shown to be robust for many different problem formulations, giving mesh independent convergence for highly adaptive meshes [26, 61, 83, 18], but these methods require specific implementations for each type of equation, boundary condition, mesh, etc., required by the specific application. The implementation in a massively parallel environment is not obvious, and research into this topic is far from exhaustive.
We present an implementation of a massively parallel, adaptive geometric multigrid (GMG) method used in the open-source finite element library deal.II [5], and perform extensive tests showing scaling of the v-cycle application on systems with up to 137 billion unknowns run on up to 65,536 processors, and demonstrating low communication overhead of the algorithms proposed. We then show the flexibility of the GMG by applying the method to four different PDE systems: the Poisson equation, linear elasticity, advection-diffusion, and the Stokes equations. For the Stokes equations, we implement a fully matrix-free, adaptive, GMG-based solver in the mantle convection code ASPECT [13], and give a comparison to the current matrix-based method used. We show improvements in robustness, parallel scaling, and memory consumption for simulations with up to 27 billion unknowns and 114,688 processors. Finally, we test the performance of IDR(s) methods compared to the FGMRES method currently used in ASPECT, showing the effects of the flexible preconditioning used for the Stokes solves in ASPECT, and the demonstrating the possible reduction in memory consumption for IDR(s) and the potential for solving large scale problems.
Parts of the work in this thesis has been submitted to peer reviewed journals in the form of two publications ([36] and [34]), and the implementations discussed have been integrated into two open-source codes, deal.II and ASPECT. From the contributions to deal.II, including a full length tutorial program, Step-63 [35], the author is listed as a contributing author to the newest deal.II release (see [5]). The implementation into ASPECT is based on work from the author and Timo Heister. The goal for the work here is to enable the community of geoscientists using ASPECT to solve larger problems than currently possible. Over the course of this thesis, the author was partially funded by the NSF Award OAC-1835452 and by the Computational Infrastructure in Geodynamics initiative (CIG), through the NSF under Award EAR-0949446 and EAR-1550901 and The University of California -- Davis
End-to-end GPU acceleration of low-order-refined preconditioning for high-order finite element discretizations
In this paper, we present algorithms and implementations for the end-to-end
GPU acceleration of matrix-free low-order-refined preconditioning of high-order
finite element problems. The methods described here allow for the construction
of effective preconditioners for high-order problems with optimal memory usage
and computational complexity. The preconditioners are based on the construction
of a spectrally equivalent low-order discretization on a refined mesh, which is
then amenable to, for example, algebraic multigrid preconditioning. The
constants of equivalence are independent of mesh size and polynomial degree.
For vector finite element problems in and (e.g.
for electromagnetic or radiation diffusion problems) a specially constructed
interpolation-histopolation basis is used to ensure fast convergence. Detailed
performance studies are carried out to analyze the efficiency of the GPU
algorithms. The kernel throughput of each of the main algorithmic components is
measured, and the strong and weak parallel scalability of the methods is
demonstrated. The different relative weighting and significance of the
algorithmic components on GPUs and CPUs is discussed. Results on problems
involving adaptively refined nonconforming meshes are shown, and the use of the
preconditioners on a large-scale magnetic diffusion problem using all spaces of
the finite element de Rham complex is illustrated.Comment: 23 pages, 13 figure
Adaptive control in rollforward recovery for extreme scale multigrid
With the increasing number of compute components, failures in future
exa-scale computer systems are expected to become more frequent. This motivates
the study of novel resilience techniques. Here, we extend a recently proposed
algorithm-based recovery method for multigrid iterations by introducing an
adaptive control. After a fault, the healthy part of the system continues the
iterative solution process, while the solution in the faulty domain is
re-constructed by an asynchronous on-line recovery. The computations in both
the faulty and healthy subdomains must be coordinated in a sensitive way, in
particular, both under and over-solving must be avoided. Both of these waste
computational resources and will therefore increase the overall
time-to-solution. To control the local recovery and guarantee an optimal
re-coupling, we introduce a stopping criterion based on a mathematical error
estimator. It involves hierarchical weighted sums of residuals within the
context of uniformly refined meshes and is well-suited in the context of
parallel high-performance computing. The re-coupling process is steered by
local contributions of the error estimator. We propose and compare two criteria
which differ in their weights. Failure scenarios when solving up to
unknowns on more than 245\,766 parallel processes will be
reported on a state-of-the-art peta-scale supercomputer demonstrating the
robustness of the method
Efficient distributed matrix-free multigrid methods on locally refined meshes for FEM computations
This work studies three multigrid variants for matrix-free finite-element
computations on locally refined meshes: geometric local smoothing, geometric
global coarsening, and polynomial global coarsening. We have integrated the
algorithms into the same framework-the open-source finite-element library
deal.II-, which allows us to make fair comparisons regarding their
implementation complexity, computational efficiency, and parallel scalability
as well as to compare the measurements with theoretically derived performance
models. Serial simulations and parallel weak and strong scaling on up to
147,456 CPU cores on 3,072 compute nodes are presented. The results obtained
indicate that global coarsening algorithms show a better parallel behavior for
comparable smoothers due to the better load balance particularly on the
expensive fine levels. In the serial case, the costs of applying hanging-node
constraints might be significant, leading to advantages of local smoothing,
even though the number of solver iterations needed is slightly higher.Comment: 34 pages, 17 figure
- …