Search CORE

13 research outputs found

Source-to-Source Automatic Differentiation of OpenMP Parallel Loops

Author: Hascoët Laurent
Hückelheim Jan
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/01/2021
Field of study

International audienceThis paper presents our work toward correct and efficient automatic differentiation of OpenMP parallel worksharing loops in forward and reverse mode. Automatic differentiation is a method to obtain gradients of numerical programs, which are crucial in optimization, uncertainty quantification, and machine learning. The computational cost to compute gradients is a common bottleneck in practice. For applications that are parallelized for multicore CPUs or GPUs using OpenMP, one also wishes to compute the gradients in parallel. We propose a framework to reason about the correctness of the generated derivative code, from which we justify our OpenMP extension to the differentiation model. We implement this model in the automatic differentiation tool Tapenade and present test cases that are differentiated following our extended differentiation procedure. Performance of the generated derivative programs in forward and reverse mode is better than sequential, although our reverse mode often scales worse than the input programs

arXiv.org e-Print Archive

INRIA a CCSD electronic archive server

Discrete adjoints on many cores Algorithmic differentiation of accelerated fluid simulations

Author: Hückelheim Jan Christian
Publication venue: 'Queen Mary University of London'
Publication date: 29/06/2017
Field of study

PhDSimulations are used in science and industry to predict the performance of technical systems. Adjoint derivatives of these simulations can reveal the sensitivity of the system performance to changes in design or operating conditions, and are increasingly used in shape optimisation and uncertainty quantification. Algorithmic differentiation (AD) by source-transformation is an efficient method to compute such derivatives. AD requires an analysis of the computation and its data flow to produce efficient adjoint code. One important step is the activity analysis that detects operations that need to be differentiated. An improved activity analysis is investigated in this thesis that simplifies build procedures for certain adjoint programs, and is demonstrated to improve the speed of an adjoint fluid dynamics solver. The method works by allowing a context-dependent analysis of routines. The ongoing trend towards multi- and many-core architectures such as the Intel XeonPhi is creating challenges for AD. Two novel approaches are presented that replicate the parallelisation of a program in its corresponding adjoint program. The first approach detects loops that naturally result in a parallelisable adjoint loop, while the second approach uses loop transformation and the aforementioned context-dependent analysis to enforce parallelisable data access in the adjoint loop. A case study shows that both approaches yield adjoints that are as scalable as their underlying primal programs. Adjoint computations are limited by their memory footprint, particularly in unsteady simulations, for which this work presents incomplete checkpointing as a method to reduce memory usage at the cost of a slight reduction in accuracy. Finally, convergence of iterative linear solvers is discussed, which is especially relevant on accelerator cards, where single precision floating point numbers are frequently used and the choice of solvers is limited by the small memory size. Some problems that are particular to adjoint computations are discussed.European Union

Queen Mary Research Online

Automatic Differentiation of Parallel Loops with Formal Methods

Author: Hascoët Laurent
Hückelheim Jan
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 29/08/2022
Field of study

International audienceThis paper presents a novel combination of reverse mode automatic differentiation and formal methods, to enable efficient differentiation of (or backpropagation through) shared-memory parallel loops. Compared to the state of the art, our approach can reduce the need for atomic updates or private data copies during the parallel derivative computation, even in the presence of unstructured or data-dependent data access patterns. This is achieved by gathering information about the memory access patterns from the input program, which is assumed to be correctly parallelized. This information is then used to build a model of assertions in a theorem prover, which can be used to check the safety of shared memory accesses during the parallel derivative loops. We demonstrate this approach on scientific computing benchmarks including a lattice-Boltzmann method (LBM) solver from the Parboil benchmark suite and a Green's function Monte Carlo (GFMC) kernel from the CORAL benchmark suite

INRIA a CCSD electronic archive server

Forward Gradients for Data-Driven CFD Wall Modeling

Author: Hückelheim Jan
Kumar Tadbhagya
Pal Pinaki
Raghavan Krishnan
Publication venue
Publication date: 28/11/2023
Field of study

Computational Fluid Dynamics (CFD) is used in the design and optimization of gas turbines and many other industrial/ scientific applications. However, the practical use is often limited by the high computational cost, and the accurate resolution of near-wall flow is a significant contributor to this cost. Machine learning (ML) and other data-driven methods can complement existing wall models. Nevertheless, training these models is bottlenecked by the large computational effort and memory footprint demanded by back-propagation. Recent work has presented alternatives for computing gradients of neural networks where a separate forward and backward sweep is not needed and storage of intermediate results between sweeps is not required because an unbiased estimator for the gradient is computed in a single forward sweep. In this paper, we discuss the application of this approach for training a subgrid wall model that could potentially be used as a surrogate in wall-bounded flow CFD simulations to reduce the computational overhead while preserving predictive accuracy

arXiv.org e-Print Archive

Source-to-Source Automatic Differentiation of OpenMP Parallel Loops

Author: Hascoët Laurent
Hückelheim Jan
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/01/2021
Field of study

INRIA a CCSD electronic archive server

Model Checking Race-freedom When "Sequential Consistency for Data-race-free Programs" is Guaranteed

Author: Hovland Paul D.
Hückelheim Jan
Luo Ziqing
Siegel Stephen F.
Wu Wenhao
Publication venue
Publication date: 29/05/2023
Field of study

Many parallel programming models guarantee that if all sequentially consistent (SC) executions of a program are free of data races, then all executions of the program will appear to be sequentially consistent. This greatly simplifies reasoning about the program, but leaves open the question of how to verify that all SC executions are race-free. In this paper, we show that with a few simple modifications, model checking can be an effective tool for verifying race-freedom. We explore this technique on a suite of C programs parallelized with OpenMP

arXiv.org e-Print Archive

Automatic Differentiation for Adjoint Stencil Loops

Author: Gorman Gerard
Hovland Paul
Hückelheim Jan
Kukreja Navjot
Luporini Fabio
Narayanan Sri Hari Krishna
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 05/07/2019
Field of study

Stencil loops are a common motif in computations including convolutional neural networks, structured-mesh solvers for partial differential equations, and image processing. Stencil loops are easy to parallelise, and their fast execution is aided by compilers, libraries, and domain-specific languages. Reverse-mode automatic differentiation, also known as algorithmic differentiation, autodiff, adjoint differentiation, or back-propagation, is sometimes used to obtain gradients of programs that contain stencil loops. Unfortunately, conventional automatic differentiation results in a memory access pattern that is not stencil-like and not easily parallelisable. In this paper we present a novel combination of automatic differentiation and loop transformations that preserves the structure and memory access pattern of stencil loops, while computing fully consistent derivatives. The generated loops can be parallelised and optimised for performance in the same way and using the same tools as the original computation. We have implemented this new technique in the Python tool PerforAD, which we release with this paper along with test cases derived from seismic imaging and computational fluid dynamics applications.Comment: ICPP 201

arXiv.org e-Print Archive

Crossref

Surrogate Neural Networks to Estimate Parametric Sensitivity of Ocean Models

Author: Brus Steven
Cucuzzella Elizabeth
Hückelheim Jan
Madireddy Sandeep
Nadiga Balu
Narayanan Sri Hari Krishna
Sun Yixuan
Van Roekel Luke
Publication venue
Publication date: 10/11/2023
Field of study

Modeling is crucial to understanding the effect of greenhouse gases, warming, and ice sheet melting on the ocean. At the same time, ocean processes affect phenomena such as hurricanes and droughts. Parameters in the models that cannot be physically measured have a significant effect on the model output. For an idealized ocean model, we generated perturbed parameter ensemble data and trained surrogate neural network models. The neural surrogates accurately predicted the one-step forward dynamics, of which we then computed the parametric sensitivity

arXiv.org e-Print Archive

Reverse-mode algorithmic differentiation of an OpenMP-parallel compressible flow solver

Author: Blazek J
Förster M
Giles MB
Jan Hückelheim
Jens-Dominik Müller
Michelle Mills Strout
Naumann U
Paul Hovland
Spalart P
Publication venue: 'SAGE Publications'
Publication date: 03/05/2017
Field of study

Crossref

Queen Mary Research Online

Automatic differentiation of parallel loops with formal methods

Author: Hascoët Laurent
Hückelheim Jan
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 02/04/2022
Field of study

International audienc

INRIA a CCSD electronic archive server