13 research outputs found
Source-to-Source Automatic Differentiation of OpenMP Parallel Loops
International audienceThis paper presents our work toward correct and efficient automatic differentiation of OpenMP parallel worksharing loops in forward and reverse mode. Automatic differentiation is a method to obtain gradients of numerical programs, which are crucial in optimization, uncertainty quantification, and machine learning. The computational cost to compute gradients is a common bottleneck in practice. For applications that are parallelized for multicore CPUs or GPUs using OpenMP, one also wishes to compute the gradients in parallel. We propose a framework to reason about the correctness of the generated derivative code, from which we justify our OpenMP extension to the differentiation model. We implement this model in the automatic differentiation tool Tapenade and present test cases that are differentiated following our extended differentiation procedure. Performance of the generated derivative programs in forward and reverse mode is better than sequential, although our reverse mode often scales worse than the input programs
Discrete adjoints on many cores Algorithmic differentiation of accelerated fluid simulations
PhDSimulations are used in science and industry to predict the performance of technical
systems. Adjoint derivatives of these simulations can reveal the sensitivity of the system
performance to changes in design or operating conditions, and are increasingly used in
shape optimisation and uncertainty quantification. Algorithmic differentiation (AD) by
source-transformation is an efficient method to compute such derivatives.
AD requires an analysis of the computation and its data flow to produce efficient
adjoint code. One important step is the activity analysis that detects operations that
need to be differentiated. An improved activity analysis is investigated in this thesis
that simplifies build procedures for certain adjoint programs, and is demonstrated to
improve the speed of an adjoint fluid dynamics solver. The method works by allowing a
context-dependent analysis of routines.
The ongoing trend towards multi- and many-core architectures such as the Intel
XeonPhi is creating challenges for AD. Two novel approaches are presented that replicate
the parallelisation of a program in its corresponding adjoint program. The first approach
detects loops that naturally result in a parallelisable adjoint loop, while the second
approach uses loop transformation and the aforementioned context-dependent analysis
to enforce parallelisable data access in the adjoint loop. A case study shows that both
approaches yield adjoints that are as scalable as their underlying primal programs.
Adjoint computations are limited by their memory footprint, particularly in unsteady
simulations, for which this work presents incomplete checkpointing as a method to
reduce memory usage at the cost of a slight reduction in accuracy.
Finally, convergence of iterative linear solvers is discussed, which is especially relevant
on accelerator cards, where single precision floating point numbers are frequently
used and the choice of solvers is limited by the small memory size. Some problems that
are particular to adjoint computations are discussed.European Union
Automatic Differentiation of Parallel Loops with Formal Methods
International audienceThis paper presents a novel combination of reverse mode automatic differentiation and formal methods, to enable efficient differentiation of (or backpropagation through) shared-memory parallel loops. Compared to the state of the art, our approach can reduce the need for atomic updates or private data copies during the parallel derivative computation, even in the presence of unstructured or data-dependent data access patterns. This is achieved by gathering information about the memory access patterns from the input program, which is assumed to be correctly parallelized. This information is then used to build a model of assertions in a theorem prover, which can be used to check the safety of shared memory accesses during the parallel derivative loops. We demonstrate this approach on scientific computing benchmarks including a lattice-Boltzmann method (LBM) solver from the Parboil benchmark suite and a Green's function Monte Carlo (GFMC) kernel from the CORAL benchmark suite
Forward Gradients for Data-Driven CFD Wall Modeling
Computational Fluid Dynamics (CFD) is used in the design and optimization of
gas turbines and many other industrial/ scientific applications. However, the
practical use is often limited by the high computational cost, and the accurate
resolution of near-wall flow is a significant contributor to this cost. Machine
learning (ML) and other data-driven methods can complement existing wall
models. Nevertheless, training these models is bottlenecked by the large
computational effort and memory footprint demanded by back-propagation. Recent
work has presented alternatives for computing gradients of neural networks
where a separate forward and backward sweep is not needed and storage of
intermediate results between sweeps is not required because an unbiased
estimator for the gradient is computed in a single forward sweep. In this
paper, we discuss the application of this approach for training a subgrid wall
model that could potentially be used as a surrogate in wall-bounded flow CFD
simulations to reduce the computational overhead while preserving predictive
accuracy
Source-to-Source Automatic Differentiation of OpenMP Parallel Loops
International audienceThis paper presents our work toward correct and efficient automatic differentiation of OpenMP parallel worksharing loops in forward and reverse mode. Automatic differentiation is a method to obtain gradients of numerical programs, which are crucial in optimization, uncertainty quantification, and machine learning. The computational cost to compute gradients is a common bottleneck in practice. For applications that are parallelized for multicore CPUs or GPUs using OpenMP, one also wishes to compute the gradients in parallel. We propose a framework to reason about the correctness of the generated derivative code, from which we justify our OpenMP extension to the differentiation model. We implement this model in the automatic differentiation tool Tapenade and present test cases that are differentiated following our extended differentiation procedure. Performance of the generated derivative programs in forward and reverse mode is better than sequential, although our reverse mode often scales worse than the input programs
Model Checking Race-freedom When "Sequential Consistency for Data-race-free Programs" is Guaranteed
Many parallel programming models guarantee that if all sequentially
consistent (SC) executions of a program are free of data races, then all
executions of the program will appear to be sequentially consistent. This
greatly simplifies reasoning about the program, but leaves open the question of
how to verify that all SC executions are race-free. In this paper, we show that
with a few simple modifications, model checking can be an effective tool for
verifying race-freedom. We explore this technique on a suite of C programs
parallelized with OpenMP
Automatic Differentiation for Adjoint Stencil Loops
Stencil loops are a common motif in computations including convolutional
neural networks, structured-mesh solvers for partial differential equations,
and image processing. Stencil loops are easy to parallelise, and their fast
execution is aided by compilers, libraries, and domain-specific languages.
Reverse-mode automatic differentiation, also known as algorithmic
differentiation, autodiff, adjoint differentiation, or back-propagation, is
sometimes used to obtain gradients of programs that contain stencil loops.
Unfortunately, conventional automatic differentiation results in a memory
access pattern that is not stencil-like and not easily parallelisable.
In this paper we present a novel combination of automatic differentiation and
loop transformations that preserves the structure and memory access pattern of
stencil loops, while computing fully consistent derivatives. The generated
loops can be parallelised and optimised for performance in the same way and
using the same tools as the original computation. We have implemented this new
technique in the Python tool PerforAD, which we release with this paper along
with test cases derived from seismic imaging and computational fluid dynamics
applications.Comment: ICPP 201
Surrogate Neural Networks to Estimate Parametric Sensitivity of Ocean Models
Modeling is crucial to understanding the effect of greenhouse gases, warming,
and ice sheet melting on the ocean. At the same time, ocean processes affect
phenomena such as hurricanes and droughts. Parameters in the models that cannot
be physically measured have a significant effect on the model output. For an
idealized ocean model, we generated perturbed parameter ensemble data and
trained surrogate neural network models. The neural surrogates accurately
predicted the one-step forward dynamics, of which we then computed the
parametric sensitivity
Automatic differentiation of parallel loops with formal methods
International audienc