3 research outputs found

    Instead of Rewriting Foreign Code for Machine Learning, Automatically Synthesize Fast Gradients

    Full text link
    Applying differentiable programming techniques and machine learning algorithms to foreign programs requires developers to either rewrite their code in a machine learning framework, or otherwise provide derivatives of the foreign code. This paper presents Enzyme, a high-performance automatic differentiation (AD) compiler plugin for the LLVM compiler framework capable of synthesizing gradients of statically analyzable programs expressed in the LLVM intermediate representation (IR). Enzyme synthesizes gradients for programs written in any language whose compiler targets LLVM IR including C, C++, Fortran, Julia, Rust, Swift, MLIR, etc., thereby providing native AD capabilities in these languages. Unlike traditional source-to-source and operator-overloading tools, Enzyme performs AD on optimized IR. On a machine-learning focused benchmark suite including Microsoft's ADBench, AD on optimized IR achieves a geometric mean speedup of 4.5x over AD on IR before optimization allowing Enzyme to achieve state-of-the-art performance. Packaging Enzyme for PyTorch and TensorFlow provides convenient access to gradients of foreign code with state-of-the art performance, enabling foreign code to be directly incorporated into existing machine learning workflows.Comment: To be published in NeurIPS 202

    Rapid software prototyping for heterogeneous and distributed platforms

    Get PDF
    The software needs of scientists and engineers are growing and their programs are becoming more compute-heavy and problem-specific. This has led to an influx of non-expert programmers, who need to use and program high-performance computing platforms. With the continued stagnation of single-threaded performance, using hardware accelerators such as GPUs or FPGAs is necessary. Adapting software to these compute platforms is a difficult task, especially for non-expert programmers, leading to applications being unable to take advantage of new hardware or requiring extensive rewrites. We propose a programming model that allows non-experts to benefit from high-performance computing, while enabling expert programmers to take full advantage of the underlying hardware. In this model, programs are generically typed, the location of the data is encoded in the type system, and multiple dispatch is used to select functionality based on the type of the data. This enables rapid prototyping, retargeting and reuse of existing software, while allowing for hardware specific optimization if required. Our approach allows development to happen in one source language enabling domain experts and performance engineers to jointly develop a program, without the overhead, friction, and challenges associated with developing in multiple programming languages for the same project. We demonstrate the viability and the core principles of this programming model in Julia using realistic examples, showing the potential of this approach for rapid prototyping, and its applicability for real-life engineering. We focus on usability for non-expert programmers and demonstrate that the potential of the underlying hardware can be fully exploited. Keywords: Julia; Generic programming; Heterogeneous systems; CUDA; Distributed computingNational Science Foundation (Grant DMS-1312831)National Science Foundation (Grant OAC-1835443
    corecore