301 research outputs found
Discrete adjoints on many cores Algorithmic differentiation of accelerated fluid simulations
PhDSimulations are used in science and industry to predict the performance of technical
systems. Adjoint derivatives of these simulations can reveal the sensitivity of the system
performance to changes in design or operating conditions, and are increasingly used in
shape optimisation and uncertainty quantification. Algorithmic differentiation (AD) by
source-transformation is an efficient method to compute such derivatives.
AD requires an analysis of the computation and its data flow to produce efficient
adjoint code. One important step is the activity analysis that detects operations that
need to be differentiated. An improved activity analysis is investigated in this thesis
that simplifies build procedures for certain adjoint programs, and is demonstrated to
improve the speed of an adjoint fluid dynamics solver. The method works by allowing a
context-dependent analysis of routines.
The ongoing trend towards multi- and many-core architectures such as the Intel
XeonPhi is creating challenges for AD. Two novel approaches are presented that replicate
the parallelisation of a program in its corresponding adjoint program. The first approach
detects loops that naturally result in a parallelisable adjoint loop, while the second
approach uses loop transformation and the aforementioned context-dependent analysis
to enforce parallelisable data access in the adjoint loop. A case study shows that both
approaches yield adjoints that are as scalable as their underlying primal programs.
Adjoint computations are limited by their memory footprint, particularly in unsteady
simulations, for which this work presents incomplete checkpointing as a method to
reduce memory usage at the cost of a slight reduction in accuracy.
Finally, convergence of iterative linear solvers is discussed, which is especially relevant
on accelerator cards, where single precision floating point numbers are frequently
used and the choice of solvers is limited by the small memory size. Some problems that
are particular to adjoint computations are discussed.European Union
A Hybrid MPI-OpenMP Parallel Implementation for pseudospectral simulations with application to Taylor-Couette Flow
A hybrid-parallel direct-numerical-simulation method with application to
turbulent Taylor-Couette flow is presented. The Navier-Stokes equations are
discretized in cylindrical coordinates with the spectral Fourier-Galerkin
method in the axial and azimuthal directions, and high-order finite differences
in the radial direction. Time is advanced by a second-order, semi-implicit
projection scheme, which requires the solution of five Helmholtz/Poisson
equations, avoids staggered grids and renders very small slip velocities.
Nonlinear terms are computed with the pseudospectral method. The code is
parallelized using a hybrid MPI-OpenMP strategy, which is simpler to implement,
reduces inter-node communications and is more efficient compared to a flat MPI
parallelization. A strong scaling study shows that the hybrid code maintains
very good scalability up to more than 20000 processor cores and thus allows to
perform simulations at higher resolutions than previously feasible, and opens
up the possibility to simulate turbulent Taylor-Couette flows at Reynolds
numbers up to . This enables to probe hydrodynamic
turbulence in Keplerian flows in experimentally relevant regimes.Comment: 30 pages, 11 figure
Adjoint computations by algorithmic differentiation of a parallel solver for time-dependent PDEs
A computational fluid dynamics code is differentiated using algorithmic
differentiation (AD) in both tangent and adjoint modes. The two novelties of
the present approach are 1) the adjoint code is obtained by letting the AD tool
Tapenade invert the complete layer of message passing interface (MPI)
communications, and 2) the adjoint code integrates time-dependent, non-linear
and dissipative (hence physically irreversible) PDEs with an explicit time
integration loop running for ca. time steps. The approach relies on
using the Adjoinable MPI library to reverse the non-blocking communication
patterns in the original code, and by controlling the memory overhead induced
by the time-stepping loop with binomial checkpointing. A description of the
necessary code modifications is provided along with the validation of the
computed derivatives and a performance comparison of the tangent and adjoint
codes.Comment: Submitted to Journal of Computational Scienc
Recommended from our members
Global convection in Earth's mantle : advanced numerical methods and extreme-scale simulations
The thermal convection of rock in Earth's mantle and associated plate tectonics are modeled by nonlinear incompressible Stokes and energy equations. This dissertation focuses on the development of advanced, scalable linear and nonlinear solvers for numerical simulations of realistic instantaneous mantle flow, where we must overcome several computational challenges. The most notable challenges are the severe nonlinearity, heterogeneity, and anisotropy due to the mantle's rheology as well as a wide range of spatial scales and highly localized features. Resolving the crucial small scale features efficiently necessitates adaptive methods, while computational results greatly benefit from a high accuracy per degree of freedom and local mass conservation. Consequently, the discretization of Earth's mantle is carried out by high-order finite elements on aggressively adaptively refined hexahedral meshes with a continuous, nodal velocity approximation and a discontinuous, modal pressure approximation. These velocity--pressure pairings yield optimal asymptotic convergence rates of the finite element approximation to the infinite-dimensional solution with decreasing mesh element size, are inf-sup stable on general, non-conforming hexahedral meshes with "hanging nodes,'' and have the advantage of preserving mass locally at the element level due to the discontinuous pressure. However, because of the difficulties cited above and the desired accuracy, the large implicit systems to be solved are extremely poorly conditioned and sophisticated linear and nonlinear solvers including powerful preconditioning techniques are required. The nonlinear Stokes system is solved using a grid continuation, inexact Newton--Krylov method. We measure the residual of the momentum equation in the H⁻¹-norm for backtracking line search to avoid overly conservative update steps that are significantly reduced from one. The Newton linearization is augmented by a perturbation of a highly nonlinear term in mantle's rheology, resulting in dramatically improved nonlinear convergence. We present a new Schur complement-based Stokes preconditioner, weighted BFBT, that exhibits robust fast convergence for Stokes problems with smooth but highly varying (up to 10 orders of magnitude) viscosities, optimal algorithmic scalability with respect to mesh refinement, and only a mild dependence on the polynomial order of high-order finite element discretizations. In addition, we derive theoretical eigenvalue bounds to prove spectral equivalence of our inverse Schur complement approximation. Finally, we present a parallel hybrid spectral--geometric--algebraic multigrid (HMG) to approximate the inverses of the Stokes system's viscous block and variable-coefficient pressure Poisson operators within weighted BFBT. Building on the parallel scalability of HMG, our Stokes solver demonstrates excellent parallel scalability to 1.6 million CPU cores without sacrificing algorithmic optimality.Computational Science, Engineering, and Mathematic
A high-performance open-source framework for multiphysics simulation and adjoint-based shape and topology optimization
The first part of this thesis presents the advances made in the Open-Source software SU2,
towards transforming it into a high-performance framework for design and optimization of
multiphysics problems. Through this work, and in collaboration with other authors, a tenfold
performance improvement was achieved for some problems. More importantly, problems that
had previously been impossible to solve in SU2, can now be used in numerical optimization
with shape or topology variables. Furthermore, it is now exponentially simpler to study new
multiphysics applications, and to develop new numerical schemes taking advantage of modern
high-performance-computing systems.
In the second part of this thesis, these capabilities allowed the application of topology optimiza-
tion to medium scale fluid-structure interaction problems, using high-fidelity models (nonlinear
elasticity and Reynolds-averaged Navier-Stokes equations), which had not been done before
in the literature. This showed that topology optimization can be used to target aerodynamic
objectives, by tailoring the interaction between fluid and structure. However, it also made ev-
ident the limitations of density-based methods for this type of problem, in particular, reliably
converging to discrete solutions. This was overcome with new strategies to both guarantee and
accelerate (i.e. reduce the overall computational cost) the convergence to discrete solutions in
fluid-structure interaction problems.Open Acces
Computing and Information Science (CIS)
Cornell University Courses of Study Vol. 97 2005/200
Architecture of Advanced Numerical Analysis Systems
This unique open access book applies the functional OCaml programming language to numerical or computational weighted data science, engineering, and scientific applications. This book is based on the authors' first-hand experience building and maintaining Owl, an OCaml-based numerical computing library. You'll first learn the various components in a modern numerical computation library. Then, you will learn how these components are designed and built up and how to optimize their performance. After reading and using this book, you'll have the knowledge required to design and build real-world complex systems that effectively leverage the advantages of the OCaml functional programming language. What You Will Learn Optimize core operations based on N-dimensional arrays Design and implement an industry-level algorithmic differentiation module Implement mathematical optimization, regression, and deep neural network functionalities based on algorithmic differentiation Design and optimize a computation graph module, and understand the benefits it brings to the numerical computing library Accommodate the growing number of hardware accelerators (e.g. GPU, TPU) and execution backends (e.g. web browser, unikernel) of numerical computation Use the Zoo system for efficient scripting, code sharing, service deployment, and composition Design and implement a distributed computing engine to work with a numerical computing library, providing convenient APIs and high performance Who This Book Is For Those with prior programming experience, especially with the OCaml programming language, or with scientific computing experience who may be new to OCaml. Most importantly, it is for those who are eager to understand not only how to use something, but also how it is built up
Resiliency in numerical algorithm design for extreme scale simulations
This work is based on the seminar titled ‘Resiliency in Numerical Algorithm Design for Extreme Scale Simulations’ held March 1–6, 2020, at Schloss Dagstuhl, that was attended by all the authors. Advanced supercomputing is characterized by very high computation speeds at the cost of involving an enormous amount of resources and costs. A typical large-scale computation running for 48 h on a system consuming 20 MW, as predicted for exascale systems, would consume a million kWh, corresponding to about 100k Euro in energy cost for executing 1023 floating-point operations. It is clearly unacceptable to lose the whole computation if any of the several million parallel processes fails during the execution. Moreover, if a single operation suffers from a bit-flip error, should the whole computation be declared invalid? What about the notion of reproducibility itself: should this core paradigm of science be revised and refined for results that are obtained by large-scale simulation? Naive versions of conventional resilience techniques will not scale to the exascale regime: with a main memory footprint of tens of Petabytes, synchronously writing checkpoint data all the way to background storage at frequent intervals will create intolerable overheads in runtime and energy consumption. Forecasts show that the mean time between failures could be lower than the time to recover from such a checkpoint, so that large calculations at scale might not make any progress if robust alternatives are not investigated. More advanced resilience techniques must be devised. The key may lie in exploiting both advanced system features as well as specific application knowledge. Research will face two essential questions: (1) what are the reliability requirements for a particular computation and (2) how do we best design the algorithms and software to meet these requirements? While the analysis of use cases can help understand the particular reliability requirements, the construction of remedies is currently wide open. One avenue would be to refine and improve on system- or application-level checkpointing and rollback strategies in the case an error is detected. Developers might use fault notification interfaces and flexible runtime systems to respond to node failures in an application-dependent fashion. Novel numerical algorithms or more stochastic computational approaches may be required to meet accuracy requirements in the face of undetectable soft errors. These ideas constituted an essential topic of the seminar. The goal of this Dagstuhl Seminar was to bring together a diverse group of scientists with expertise in exascale computing to discuss novel ways to make applications resilient against detected and undetected faults. In particular, participants explored the role that algorithms and applications play in the holistic approach needed to tackle this challenge. This article gathers a broad range of perspectives on the role of algorithms, applications and systems in achieving resilience for extreme scale simulations. The ultimate goal is to spark novel ideas and encourage the development of concrete solutions for achieving such resilience holistically.Peer Reviewed"Article signat per 36 autors/es: Emmanuel Agullo, Mirco Altenbernd, Hartwig Anzt, Leonardo Bautista-Gomez, Tommaso Benacchio, Luca Bonaventura, Hans-Joachim Bungartz, Sanjay Chatterjee, Florina M. Ciorba, Nathan DeBardeleben, Daniel Drzisga, Sebastian Eibl, Christian Engelmann, Wilfried N. Gansterer, Luc Giraud, Dominik G ̈oddeke, Marco Heisig, Fabienne Jezequel, Nils Kohl, Xiaoye Sherry Li, Romain Lion, Miriam Mehl, Paul Mycek, Michael Obersteiner, Enrique S. Quintana-Ortiz,
Francesco Rizzi, Ulrich Rude, Martin Schulz, Fred Fung, Robert Speck, Linda Stals, Keita Teranishi, Samuel Thibault, Dominik Thonnes, Andreas Wagner and Barbara Wohlmuth"Postprint (author's final draft
- …