54 research outputs found
Scalable Adaptive Mantle Convection Simulation on Petascale Supercomputers
Mantle convection is the principal control on
the thermal and geological evolution of the Earth. Mantle
convection modeling involves solution of the mass, momentum,
and energy equations for a viscous, creeping, incompressible
non-Newtonian fluid at high Rayleigh and Peclet
numbers. Our goal is to conduct global mantle convection
simulations that can resolve faulted plate boundaries, down
to 1 km scales. However, uniform resolution at these scales
would result in meshes with a trillion elements, which
would elude even sustained petaflops supercomputers. Thus
parallel adaptive mesh refinement and coarsening (AMR)
is essential.
We present RHEA, a new generation mantle convection
code designed to scale to hundreds of thousands of cores.
RHEA is built on ALPS, a parallel octree-based adaptive
mesh finite element library that provides new distributed
data structures and parallel algorithms for dynamic coarsening,
refinement, rebalancing, and repartitioning of the
mesh. ALPS currently supports low order continuous
Lagrange elements, and arbitrary order discontinuous
Galerkin spectral elements, on octree meshes. A forest-ofoctrees
implementation permits nearly arbitrary geometries
to be accommodated. Using TACC’s 579 teraflops
Ranger supercomputer, we demonstrate excellent weak and
strong scalability of parallel AMR on up to 62,464 cores
for problems with up to 12.4 billion elements. With RHEA’s
adaptive capabilities, we have been able to reduce the
number of elements by over three orders of magnitude,
thus enabling us to simulate large-scale mantle convection
with finest local resolution of 1.5 km
Large-scale adaptive mantle convection simulation
A new generation, parallel adaptive-mesh mantle convection code, Rhea, is described and benchmarked. Rhea targets large-scale mantle convection simulations on parallel computers, and thus has been developed with a strong focus on computational efficiency and parallel scalability of both mesh handling and numerical solvers. Rhea builds mantle convection solvers on a collection of parallel octree-based adaptive finite element libraries that support new distributed data structures and parallel algorithms for dynamic coarsening, refinement, rebalancing and repartitioning of the mesh. In this study we demonstrate scalability to 122 880 compute cores and verify correctness of the implementation. We present the numerical approximation and convergence properties using 3-D benchmark problems and other tests for variable-viscosity Stokes flow and thermal convection
An extreme-scale implicit solver for complex PDEs: highly heterogeneous flow in earth's mantle
Mantle convection is the fundamental physical process within earth's interior responsible for the thermal and geological evolution of the planet, including plate tectonics. The mantle is modeled as a viscous, incompressible, non-Newtonian fluid. The wide range of spatial scales, extreme variability and anisotropy in material properties, and severely nonlinear rheology have made global mantle convection modeling with realistic parameters prohibitive. Here we present a new implicit solver that exhibits optimal algorithmic performance and is capable of extreme scaling for hard PDE problems, such as mantle convection. To maximize accuracy and minimize runtime, the solver incorporates a number of advances, including aggressive multi-octree adaptivity, mixed continuous-discontinuous discretization, arbitrarily-high-order accuracy, hybrid spectral/geometric/algebraic multigrid, and novel Schur-complement preconditioning. These features present enormous challenges for extreme scalability. We demonstrate that---contrary to conventional wisdom---algorithmically optimal implicit solvers can be designed that scale out to 1.5 million cores for severely nonlinear, ill-conditioned, heterogeneous, and anisotropic PDEs
Multi-scale dynamics and rheology of mantle flow with plates
Fundamental issues in our understanding of plate and mantle dynamics remain unresolved, including the rheology and state of stress of plates and slabs; the coupling between plates, slabs and mantle; and the flow around slabs. To address these questions, models of global mantle flow with plates are computed using adaptive finite elements, and compared to a variety of observational constraints. The dynamically consistent instantaneous models include a composite rheology with yielding, and incorporate details of the thermal buoyancy field. Around plate boundaries, the local resolution is 1 km, which allows us to study highly detailed features in a globally consistent framework. Models that best fit plateness criteria and plate motion data have strong slabs with high stresses. We find a strong dependence of global plate motions, trench rollback, net rotation, plateness, and strain rate on the stress exponent in the nonlinear viscosity; the yield stress is found to be important only if it is smaller than the ambient convective stress. Due to strong coupling between plates, slabs, and the surrounding mantle, the presence of lower mantle anomalies affect plate motions. The flow in and around slabs, microplate motion, and trench rollback are intimately linked to the amount of yielding in the subducting slab hinge, slab morphology, and the presence of high viscosity structures in the lower mantle beneath the slab
Scaling and Resilience in Numerical Algorithms for Exascale Computing
The first Petascale supercomputer, the IBM Roadrunner, went online in 2008. Ten years later, the community is now looking ahead to a new generation of Exascale machines. During the decade that has passed, several hundred Petascale capable machines have been installed worldwide, yet despite the abundance of machines, applications that scale to their full size remain rare. Large clusters now routinely have 50.000+ cores, some have several million. This extreme level of parallelism, that has allowed a theoretical compute capacity in excess of a million billion operations per second, turns out to be difficult to use in many applications of practical interest. Processors often end up spending more time waiting for synchronization, communication, and other coordinating operations to complete, rather than actually computing. Component reliability is another challenge facing HPC developers. If even a single processor fail, among many thousands, the user is forced to restart traditional applications, wasting valuable compute time. These issues collectively manifest themselves as low parallel efficiency, resulting in waste of energy and computational resources. Future performance improvements are expected to continue to come in large part due to increased parallelism. One may therefore speculate that the difficulties currently faced, when scaling applications to Petascale machines, will progressively worsen, making it difficult for scientists to harness the full potential of Exascale computing.
The thesis comprises two parts. Each part consists of several chapters discussing modifications of numerical algorithms to make them better suited for future Exascale machines. In the first part, the use of Parareal for Parallel-in-Time integration techniques for scalable numerical solution of partial differential equations is considered. We propose a new adaptive scheduler that optimize the parallel efficiency by minimizing the time-subdomain length without making communication of time-subdomains too costly. In conjunction with an appropriate preconditioner, we demonstrate that it is possible to obtain time-parallel speedup on the nonlinear shallow water equation, beyond what is possible using conventional spatial domain-decomposition techniques alone. The part is concluded with the proposal of a new method for constructing Parallel-in-Time integration schemes better suited for convection dominated problems.
In the second part, new ways of mitigating the impact of hardware failures are developed and presented. The topic is introduced with the creation of a new fault-tolerant variant of Parareal. In the chapter that follows, a C++ Library for multi-level checkpointing is presented. The library uses lightweight in-memory checkpoints, protected trough the use of erasure codes, to mitigate the impact of failures by decreasing the overhead of checkpointing and minimizing the compute work lost. Erasure codes have the unfortunate property that if more data blocks are lost than parity codes created, the data is effectively considered unrecoverable. The final chapter contains a preliminary study on partial information recovery for incomplete checksums. Under the assumption that some meta knowledge exists on the structure of the data encoded, we show that the data lost may be recovered, at least partially. This result is of interest not only in HPC but also in data centers where erasure codes are widely used to protect data efficiently
- …