3 research outputs found
Instead of Rewriting Foreign Code for Machine Learning, Automatically Synthesize Fast Gradients
Applying differentiable programming techniques and machine learning
algorithms to foreign programs requires developers to either rewrite their code
in a machine learning framework, or otherwise provide derivatives of the
foreign code. This paper presents Enzyme, a high-performance automatic
differentiation (AD) compiler plugin for the LLVM compiler framework capable of
synthesizing gradients of statically analyzable programs expressed in the LLVM
intermediate representation (IR). Enzyme synthesizes gradients for programs
written in any language whose compiler targets LLVM IR including C, C++,
Fortran, Julia, Rust, Swift, MLIR, etc., thereby providing native AD
capabilities in these languages. Unlike traditional source-to-source and
operator-overloading tools, Enzyme performs AD on optimized IR. On a
machine-learning focused benchmark suite including Microsoft's ADBench, AD on
optimized IR achieves a geometric mean speedup of 4.5x over AD on IR before
optimization allowing Enzyme to achieve state-of-the-art performance. Packaging
Enzyme for PyTorch and TensorFlow provides convenient access to gradients of
foreign code with state-of-the art performance, enabling foreign code to be
directly incorporated into existing machine learning workflows.Comment: To be published in NeurIPS 202
Rapid software prototyping for heterogeneous and distributed platforms
The software needs of scientists and engineers are growing and their programs are becoming more compute-heavy and problem-specific. This has led to an influx of non-expert programmers, who need to use and program high-performance computing platforms. With the continued stagnation of single-threaded performance, using hardware accelerators such as GPUs or FPGAs is necessary. Adapting software to these compute platforms is a difficult task, especially for non-expert programmers, leading to applications being unable to take advantage of new hardware or requiring extensive rewrites. We propose a programming model that allows non-experts to benefit from high-performance computing, while enabling expert programmers to take full advantage of the underlying hardware. In this model, programs are generically typed, the location of the data is encoded in the type system, and multiple dispatch is used to select functionality based on the type of the data. This enables rapid prototyping, retargeting and reuse of existing software, while allowing for hardware specific optimization if required. Our approach allows development to happen in one source language enabling domain experts and performance engineers to jointly develop a program, without the overhead, friction, and challenges associated with developing in multiple programming languages for the same project. We demonstrate the viability and the core principles of this programming model in Julia using realistic examples, showing the potential of this approach for rapid prototyping, and its applicability for real-life engineering. We focus on usability for non-expert programmers and demonstrate that the potential of the underlying hardware can be fully exploited. Keywords: Julia; Generic programming; Heterogeneous systems; CUDA; Distributed computingNational Science Foundation (Grant DMS-1312831)National Science Foundation (Grant OAC-1835443