21 research outputs found

    Compiler Support for Operator Overloading and Algorithmic Differentiation in C++

    Get PDF
    Multiphysics software needs derivatives for, e.g., solving a system of non-linear equations, conducting model verification, or sensitivity studies. In C++, algorithmic differentiation (AD), based on operator overloading (overloading), can be used to calculate derivatives up to machine precision. To that end, the built-in floating-point type is replaced by the user-defined AD type. It overloads all required operators, and calculates the original value and the corresponding derivative based on the chain rule of calculus. While changing the underlying type seems straightforward, several complications arise concerning software and performance engineering. This includes (1) fundamental language restrictions of C++ w.r.t. user-defined types, (2) type correctness of distributed computations with the Message Passing Interface (MPI) library, and (3) identification and mitigation of AD induced overheads. To handle these issues, AD experts may spend a significant amount of time to enhance a code with AD, verify the derivatives and ensure optimal application performance. Hence, in this thesis, we propose a modern compiler-based tooling approach to support and accelerate the AD-enhancement process of C++ target codes. In particular, we make contributions to three aspects of AD. The initial type change - While the change to the AD type in a target code is conceptually straightforward, the type change often leads to a multitude of compiler error messages. This is due to the different treatment of built-in floating-point types and user-defined types by the C++ language standard. Previously legal code constructs in the target code subsequently violate the language standard when the built-in floating-point type is replaced with a user-defined AD type. We identify and classify these problematic code constructs and their root cause is shown. Solutions by localized source transformation are proposed. To automate this rather mechanical process, we develop a static code analyser and source transformation tool, called OO-Lint, based on the Clang compiler framework. It flags instances of these problematic code constructs and applies source transformations to make the code compliant with the requirements of the language standard. To show the overall relevance of complications with user-defined types, OO-Lint is applied to several well-known scientific codes, some of which have already been AD enhanced by others. In all of these applications, except the ones manually treated for AD overloading, problematic code constructs are detected. Type correctness of MPI communication - MPI is the de-facto standard for programming high performance, distributed applications. At the same time, MPI has a complex interface whose usage can be error-prone. For instance, MPI derived data types require manual construction by specifying memory locations of the underlying data. Specifying wrong offsets can lead to subtle bugs that are hard to detect. In the context of AD, special libraries exist that handle the required derivative book-keeping by replacing the MPI communication calls with overloaded variants. However, on top of the AD type change, the MPI communication routines have to be changed manually. In addition, the AD type fundamentally changes memory layout assumptions as it has a different extent than the built-in types. Previously legal layout assumptions have, thus, to be reverified. As a remedy, to detect any type-related errors, we developed a memory sanitizer tool, called TypeART, based on the LLVM compiler framework and the MPI correctness checker MUST. It tracks all memory allocations relevant to MPI communication to allow for checking the underlying type and extent of the typeless memory buffer address passed to any MPI routine. The overhead induced by TypeART w.r.t. several target applications is manageable. AD domain-specific profiling - Applying AD in a black-box manner, without consideration of the target code structure, can have a significant impact on both runtime and memory consumption. An AD expert is usually required to apply further AD-related optimizations for the reduction of these induced overheads. Traditional profiling techniques are, however, insufficient as they do not reveal any AD domain-specific metrics. Of interest for AD code optimization are, e.g., specific code patterns, especially on a function level, that can be treated efficiently with AD. To that end, we developed a static profiling tool, called ProAD, based on the LLVM compiler framework. For each function, it generates the computational graph based on the static data flow of the floating-point variables. The framework supports pattern analysis on the computational graph to identify the optimal application of the chain rule. We show the potential of the optimal application of AD with two case studies. In both cases, significant runtime improvements can be achieved when the knowledge of the code structure, provided by our tool, is exploited. For instance, with a stencil code, a speedup factor of about 13 is achieved compared to a naive application of AD and a factor of 1.2 compared to hand-written derivative code

    From Valid Measurements to Valid Mini-Apps

    Get PDF
    In high-performance computing, performance analysis, tuning, and exploration are relevant throughout the life cycle of an application. State-of-the-art tools provide capable measurement infrastructure, but they lack automation of repetitive tasks, such as iterative measurement-overhead reduction, or tool support for challenging and time-consuming tasks, e.g., mini-app creation. In this thesis, we address this situation with (a) a comparative study on overheads introduced by different tools, (b) the tool PIRA for automatic instrumentation refinement, and (c) a tool-supported approach for mini-app extraction. In particular, we present PIRA for automatic iterative performance measurement refinement. It performs whole-program analysis using both source-code and runtime information to heuristically determine where in the target application measurement hooks should be placed for a low-overhead assessment. At the moment, PIRA offers a runtime heuristic to identify compute-intensive parts, a performance-model heuristic to identify scalability limitations, and a load imbalance detection heuristic. In our experiments, PIRA compared to Score-P’s built-in filtering significantly reduces the runtime overhead in 13 out of 15 benchmark cases and typically introduces a slowdown of < 10 %. To provide PIRA with the required infrastructure, we develop MetaCG — an extendable lightweight whole-program call-graph library for C/C++. The library offers a compiler-agnostic call-graph (CG) representation, a Clang-based tool to construct a target’s CG, and a tool to validate the structure of the MetaCG. In addition to its use in PIRA, we show that whole-program CG analysis reduces the number of allocation to track by the memory tracking sanitizer TypeART by up to a factor of 2,350×. Finally, we combine the presented tools and develop a tool-supported approach to (a) identify, and (b) extract relevant application regions into representative mini-apps. Therefore, we present a novel Clang-based source-to-source translator and a type-safe checkpoint-restart (CPR) interface as a common interface to existing MPI-parallel CPR libraries. We evaluate the approach by extracting a mini-app of only 1,100 lines of code from an 8.5 million lines of code application. The mini-app is subsequently analyzed, and maintains the significant characteristics of the original application’s behavior. It is then used for tool-supported parallelization, which led to a speed-up of 35 %. The software presented in this thesis is available at https://github.com/tudasc

    A usability case study of algorithmic differentiation tools on the ISSM ice sheet model

    No full text
    Algorithmic differentiation (AD) based on operator overloading is often the only feasible approach for applying AD in complex C++ software environments. Challenges pertaining to the introduction of an AD tool based on operator overloading have been studied in the past. However, in order to assess possible performance gains or to verify derivative values, it is advantageous to be able to apply more than one AD tool to a given code. Hence, in this work, we investigate usability issues when exchanging AD tools. Our study is based on the NASA/JPL/UCI Ice Sheet System Model (ISSM) which currently employs the AD tool ADOL-C. We introduce CoDiPack to ISSM, a more recent AD tool offering a similar set of features while promising performance improvements. In addition to the obvious type change for the AD-augmented float type, this transition requires the change to a different adjoint MPI library, adaptation of the MUMPS solver wrapper, and changes to the derivative seeding and extraction routines. We believe that these issues are fairly generic for numerical simulation software, and the issues we report on provide a blueprint for similar undertakings. We also believe that our experiences provide guidance towards the development of AD interfaces that support AD tool interoperability. In addition, we improve upon the memory management of the existing ADOL-C instrumentation, which exhibited considerable runtime problems for higher mesh resolutions. We conduct serial and parallel ISSM model runs on a 2D mass transport benchmark as well as a model of the Pine Island Glacier to verify the derivatives computed by both tools and report on runtime performance and memory usage. In comparison, the CoDiPack AD variant of ISSM runs faster with less memory overhead than the ADOL-C variant and, thus, enables future model runs with an increased number of mesh elements. But the existence of two different AD implementations provides added confidence in the correctness of derivatives, in particular for future AD tool versions

    Ecology-based planning. Italian and French experimentations

    Get PDF
    This paper examines some French and Italian experimentations of green infrastructures’ (GI) construction in relation to their techniques and methodologies. The construction of a multifunctional green infrastructure can lead to the generation of a number of relevant bene fi ts able to face the increasing challenges of climate change and resilience (for example, social, ecological and environmental through the recognition of the concept of ecosystem services) and could ease the achievement of a performance-based approach. This approach, differently from the traditional prescriptive one, helps to attain a better and more fl exible land-use integration. In both countries, GI play an important role in contrasting land take and, for their adaptive and cross-scale nature, they help to generate a res ilient approach to urban plans and projects. Due to their fl exible and site-based nature, GI can be adapted, even if through different methodologies and approaches, both to urban and extra-urban contexts. On one hand, France, through its strong national policy on ecological networks, recognizes them as one of the major planning strategies toward a more sustainable development of territories; on the other hand, Italy has no national policy and Regions still have a hard time integrating them in already existing planning tools. In this perspective, Italian experimentations on GI construction appear to be a simple and sporadic add-on of urban and regional plans

    LDRD Annual Report FY2006

    Full text link
    corecore