1,243 research outputs found

    Compiler Support for Operator Overloading and Algorithmic Differentiation in C++

    Get PDF
    Multiphysics software needs derivatives for, e.g., solving a system of non-linear equations, conducting model verification, or sensitivity studies. In C++, algorithmic differentiation (AD), based on operator overloading (overloading), can be used to calculate derivatives up to machine precision. To that end, the built-in floating-point type is replaced by the user-defined AD type. It overloads all required operators, and calculates the original value and the corresponding derivative based on the chain rule of calculus. While changing the underlying type seems straightforward, several complications arise concerning software and performance engineering. This includes (1) fundamental language restrictions of C++ w.r.t. user-defined types, (2) type correctness of distributed computations with the Message Passing Interface (MPI) library, and (3) identification and mitigation of AD induced overheads. To handle these issues, AD experts may spend a significant amount of time to enhance a code with AD, verify the derivatives and ensure optimal application performance. Hence, in this thesis, we propose a modern compiler-based tooling approach to support and accelerate the AD-enhancement process of C++ target codes. In particular, we make contributions to three aspects of AD. The initial type change - While the change to the AD type in a target code is conceptually straightforward, the type change often leads to a multitude of compiler error messages. This is due to the different treatment of built-in floating-point types and user-defined types by the C++ language standard. Previously legal code constructs in the target code subsequently violate the language standard when the built-in floating-point type is replaced with a user-defined AD type. We identify and classify these problematic code constructs and their root cause is shown. Solutions by localized source transformation are proposed. To automate this rather mechanical process, we develop a static code analyser and source transformation tool, called OO-Lint, based on the Clang compiler framework. It flags instances of these problematic code constructs and applies source transformations to make the code compliant with the requirements of the language standard. To show the overall relevance of complications with user-defined types, OO-Lint is applied to several well-known scientific codes, some of which have already been AD enhanced by others. In all of these applications, except the ones manually treated for AD overloading, problematic code constructs are detected. Type correctness of MPI communication - MPI is the de-facto standard for programming high performance, distributed applications. At the same time, MPI has a complex interface whose usage can be error-prone. For instance, MPI derived data types require manual construction by specifying memory locations of the underlying data. Specifying wrong offsets can lead to subtle bugs that are hard to detect. In the context of AD, special libraries exist that handle the required derivative book-keeping by replacing the MPI communication calls with overloaded variants. However, on top of the AD type change, the MPI communication routines have to be changed manually. In addition, the AD type fundamentally changes memory layout assumptions as it has a different extent than the built-in types. Previously legal layout assumptions have, thus, to be reverified. As a remedy, to detect any type-related errors, we developed a memory sanitizer tool, called TypeART, based on the LLVM compiler framework and the MPI correctness checker MUST. It tracks all memory allocations relevant to MPI communication to allow for checking the underlying type and extent of the typeless memory buffer address passed to any MPI routine. The overhead induced by TypeART w.r.t. several target applications is manageable. AD domain-specific profiling - Applying AD in a black-box manner, without consideration of the target code structure, can have a significant impact on both runtime and memory consumption. An AD expert is usually required to apply further AD-related optimizations for the reduction of these induced overheads. Traditional profiling techniques are, however, insufficient as they do not reveal any AD domain-specific metrics. Of interest for AD code optimization are, e.g., specific code patterns, especially on a function level, that can be treated efficiently with AD. To that end, we developed a static profiling tool, called ProAD, based on the LLVM compiler framework. For each function, it generates the computational graph based on the static data flow of the floating-point variables. The framework supports pattern analysis on the computational graph to identify the optimal application of the chain rule. We show the potential of the optimal application of AD with two case studies. In both cases, significant runtime improvements can be achieved when the knowledge of the code structure, provided by our tool, is exploited. For instance, with a stencil code, a speedup factor of about 13 is achieved compared to a naive application of AD and a factor of 1.2 compared to hand-written derivative code

    Sensitivity computations by algorithmic differentiation of a high-­order cfd code based on spectral differences

    Get PDF
    We compute flow sensitivities by differentiating a high-­order computational fluid dynamics code. Our fully discrete approach relies on automatic differentiation (AD) of the original source code. We obtain two transformed codes by using the AD tool Tapenade (INRIA), one for each differentiation mode: tangent and adjoint. Both differentiated codes are tested against each other by computing sensitivities in an unsteady test case. The results from both codes agree to within machine accuracy, and compare well with those approximated by finite differences. We compare execution times and discuss the encountered technical difficulties due to 1) the code parallelism and 2) the memory overhead caused by unsteady problems

    Adjoint computations by algorithmic differentiation of a parallel solver for time-dependent PDEs

    Get PDF
    A computational fluid dynamics code is differentiated using algorithmic differentiation (AD) in both tangent and adjoint modes. The two novelties of the present approach are 1) the adjoint code is obtained by letting the AD tool Tapenade invert the complete layer of message passing interface (MPI) communications, and 2) the adjoint code integrates time-dependent, non-linear and dissipative (hence physically irreversible) PDEs with an explicit time integration loop running for ca. 10610^{6} time steps. The approach relies on using the Adjoinable MPI library to reverse the non-blocking communication patterns in the original code, and by controlling the memory overhead induced by the time-stepping loop with binomial checkpointing. A description of the necessary code modifications is provided along with the validation of the computed derivatives and a performance comparison of the tangent and adjoint codes.Comment: Submitted to Journal of Computational Scienc

    HYBRID PARALLELISATION OF AN ALGORITHMICALLY DIFFERENTIATED ADJOINT SOLVER

    Get PDF
    This research has been supported by the European Commission under the HORIZON 2020 Marie Curie fellowship (grant no. 642959

    Reduction of the Random Access Memory Size in Adjoint Algorithmic Differentiation by Overloading

    Full text link
    Adjoint algorithmic differentiation by operator and function overloading is based on the interpretation of directed acyclic graphs resulting from evaluations of numerical simulation programs. The size of the computer system memory required to store the graph grows proportional to the number of floating-point operations executed by the underlying program. It quickly exceeds the available memory resources. Naive adjoint algorithmic differentiation often becomes infeasible except for relatively simple numerical simulations. Access to the data associated with the graph can be classified as sequential and random. The latter refers to memory access patterns defined by the adjacency relationship between vertices within the graph. Sequentially accessed data can be decomposed into blocks. The blocks can be streamed across the system memory hierarchy thus extending the amount of available memory, for example, to hard discs. Asynchronous i/o can help to mitigate the increased cost due to accesses to slower memory. Much larger problem instances can thus be solved without resorting to technically challenging user intervention such as checkpointing. Randomly accessed data should not have to be decomposed. Its block-wise streaming is likely to yield a substantial overhead in computational cost due to data accesses across blocks. Consequently, the size of the randomly accessed memory required by an adjoint should be kept minimal in order to eliminate the need for decomposition. We propose a combination of dedicated memory for adjoint LL-values with the exploitation of remainder bandwidth as a possible solution. Test results indicate significant savings in random access memory size while preserving overall computational efficiency
    corecore