9 research outputs found
Index handling and assign optimization for Algorithmic Differentiation reuse index managers
For operator overloading Algorithmic Differentiation tools, the
identification of primal variables and adjoint variables is usually done via
indices. Two common schemes exist for their management and distribution. The
linear approach is easy to implement and supports memory optimization with
respect to copy statements. On the other hand, the reuse approach requires more
implementation effort but results in much smaller adjoint vectors, which are
more suitable for the vector mode of Algorithmic Differentiation. In this
paper, we present both approaches, how to implement them, and discuss their
advantages, disadvantages and properties of the resulting Algorithmic
Differentiation type. In addition, a new management scheme is presented which
supports copy optimizations and the reuse of indices, thus combining the
advantages of the other two. The implementations of all three schemes are
compared on a simple synthetic example and on a real world example using the
computational fluid dynamics solver in SU2.Comment: 20 pages, 14 figures, 4 table
Reverse-Mode Automatic Differentiation of Compiled Programs
Tools for algorithmic differentiation (AD) provide accurate derivatives of
computer-implemented functions for use in, e. g., optimization and machine
learning (ML). However, they often require the source code of the function to
be available in a restricted set of programming languages. As a step towards
making AD accessible for code bases with cross-language or closed-source
components, we recently presented the forward-mode AD tool Derivgrind. It
inserts forward-mode AD logic into the machine code of a compiled program using
the Valgrind dynamic binary instrumentation framework. This work extends
Derivgrind, adding the capability to record the real-arithmetic evaluation
tree, and thus enabling operator overloading style reverse-mode AD for compiled
programs. We maintain the high level of correctness reported for Derivgrind's
forward mode, failing the same few testcases in an extensive test suite for the
same well-understood reasons. Runtime-wise, the recording slows down the
execution of a compiled 64-bit benchmark program by a factor of about 180.Comment: 17 pages, 5 figures, 1 listin
Forward-Mode Automatic Differentiation of Compiled Programs
Algorithmic differentiation (AD) is a set of techniques that provide partial
derivatives of computer-implemented functions. Such a function can be supplied
to state-of-the-art AD tools via its source code, or via an intermediate
representation produced while compiling its source code.
We present the novel AD tool Derivgrind, which augments the machine code of
compiled programs with forward-mode AD logic. Derivgrind leverages the Valgrind
instrumentation framework for a structured access to the machine code, and a
shadow memory tool to store dot values. Access to the source code is required
at most for the files in which input and output variables are defined.
Derivgrind's versatility comes at the price of scaling the run-time by a
factor between 30 and 75, measured on a benchmark based on a numerical solver
for a partial differential equation. Results of our extensive regression test
suite indicate that Derivgrind produces correct results on GCC- and
Clang-compiled programs, including a Python interpreter, with a small number of
exceptions. While we provide a list of scenarios that Derivgrind does not
handle correctly, nearly all of them are academic counterexamples or originate
from highly optimized math libraries. As long as differentiating those is
avoided, Derivgrind can be applied to an unprecedentedly wide range of
cross-language or partially closed-source software with little integration
efforts.Comment: 21 pages, 3 figures, 3 tables, 5 listing
Exploration of differentiability in a proton computed tomography simulation framework
Objective. Gradient-based optimization using algorithmic derivatives can be a useful technique to improve engineering designs with respect to a computer-implemented objective function. Likewise, uncertainty quantification through computer simulations can be carried out by means of derivatives of the computer simulation. However, the effectiveness of these techniques depends on how âwell-linearizableâ the software is. In this study, we assess how promising derivative information of a typical proton computed tomography (pCT) scan computer simulation is for the aforementioned applications. Approach. This study is mainly based on numerical experiments, in which we repeatedly evaluate three representative computational steps with perturbed input values. We support our observations with a review of the algorithmic steps and arithmetic operations performed by the software, using debugging techniques. Main results. The model-based iterative reconstruction (MBIR) subprocedure (at the end of the software pipeline) and the Monte Carlo (MC) simulation (at the beginning) were piecewise differentiable. However, the observed high density and magnitude of jumps was likely to preclude most meaningful uses of the derivatives. Jumps in the MBIR function arose from the discrete computation of the set of voxels intersected by a proton path, and could be reduced in magnitude by a âfuzzy voxelsâ approach. The investigated jumps in the MC function arose from local changes in the control flow that affected the amount of consumed random numbers. The tracking algorithm solves an inherently non-differentiable problem. Significance. Besides the technical challenges of merely applying AD to existing software projects, the MC and MBIR codes must be adapted to compute smoother functions. For the MBIR code, we presented one possible approach for this while for the MC code, this will be subject to further research. For the tracking subprocedure, further research on surrogate models is necessary
AutoMat -- Automatic Differentiation for Generalized Standard Materials on GPUs
We propose a universal method for the evaluation of generalized standard
materials that greatly simplifies the material law implementation process. By
means of automatic differentiation and a numerical integration scheme, AutoMat
reduces the implementation effort to two potential functions. By moving AutoMat
to the GPU, we close the performance gap to conventional evaluation routines
and demonstrate in detail that the expression level reverse mode of automatic
differentiation as well as its extension to second order derivatives can be
applied inside CUDA kernels. We underline the effectiveness and the
applicability of AutoMat by integrating it into the FFT-based homogenization
scheme of Moulinec and Suquet and discuss the benefits of using AutoMat with
respect to runtime and solution accuracy for an elasto-viscoplastic example.Comment: 28 pages, 15 figures, 7 tables; new layout, more detailed proof of
Theorem
SciCompKL/CoDiPack: Version 2.2.0
<h3>v 2.2.0 - 2024-01-30</h3>
<ul>
<li><p>Features:</p>
<ul>
<li>New helper for adding Enzyme-generated derivative functions to the tape. See \ref Example_24_Enzyme_external_function_helper.</li>
<li>Recover primal values from primal values tapes in ExternalFunctionHelper.</li>
<li>Forward AD type for CUDA kernels.</li>
<li>Matrix matrix multiplications can now be handled in an optimal way with CoDiPack.</li>
<li>Tagging tape for detecting errors in the AD workflow.</li>
</ul>
</li>
<li><p>Bugfix:</p>
<ul>
<li>Uninitialized values in external function helper.</li>
<li>External function outputs in Jacobian tapes no longer use unused indices.</li>
</ul>
</li>
<li><p>Other:
the overhead for storing data as mutch as possible.</p>
<ul>
<li>Added low level function support to the tapes.
Low level functions are between external functions and statements. As they can occur quite often, they reduce</li>
</ul>
<ul>
<li>Added helper structures for creating low level functions.</li>
<li>External functions are now handled via the low level function interface.</li>
</ul>
</li>
</ul>
Event-Based Automatic Differentiation of OpenMP with OpDiLib
We present the new software OpDiLib, a universal add-on for classical
operator overloading AD tools that enables the automatic differentiation (AD)
of OpenMP parallelized code. With it, we establish support for OpenMP features
in a reverse mode operator overloading AD tool to an extent that was previously
only reported on in source transformation tools. We achieve this with an
event-based implementation ansatz that is unprecedented in AD. Combined with
modern OpenMP features around OMPT, we demonstrate how it can be used to
achieve differentiation without any additional modifications of the source
code; neither do we impose a priori restrictions on the data access patterns,
which makes OpDiLib highly applicable. For further performance optimizations,
restrictions like atomic updates on the adjoint variables can be lifted in a
fine-grained manner for any parts of the code. OpDiLib can also be applied in a
semi-automatic fashion via a macro interface, which supports compilers that do
not implement OMPT. In a detailed performance study, we demonstrate the
applicability of OpDiLib for a pure operator overloading approach in a hybrid
parallel environment. We quantify the cost of atomic updates on the adjoint
vector and showcase the speedup and scaling that can be achieved with the
different configurations of OpDiLib in both the forward and the reverse pass.Comment: 34 pages, 10 figures, 2 tables, 12 listing
Exploration of Differentiability in a Proton Computed Tomography Simulation Framework
Objective. Algorithmic differentiation (AD) can be a useful technique to numerically optimize design and algorithmic parameters by, and quantify uncertainties in, computer simulations. However, the effectiveness of AD depends on how 'well-linearizable' the software is. In this study, we assess how promising derivative information of a typical proton computed tomography (pCT) scan computer simulation is for the aforementioned applications. Approach. This study is mainly based on numerical experiments, in which we repeatedly evaluate three representative computational steps with perturbed input values. We support our observations with a review of the algorithmic steps and arithmetic operations performed by the software, using debugging techniques. Main results. The model-based iterative reconstruction (MBIR) subprocedure (at the end of the software pipeline) and the Monte Carlo (MC) simulation (at the beginning) were piecewise differentiable. Jumps in the MBIR function arose from the discrete computation of the set of voxels intersected by a proton path. Jumps in the MC function likely arose from changes in the control flow that affect the amount of consumed random numbers. The tracking algorithm solves an inherently non-differentiable problem. Significance. The MC and MBIR codes are ready for the integration of AD, and further research on surrogate models for the tracking subprocedure is necessary