9 research outputs found
AutoGraph: Imperative-style Coding with Graph-based Performance
There is a perceived trade-off between machine learning code that is easy to
write, and machine learning code that is scalable or fast to execute. In
machine learning, imperative style libraries like Autograd and PyTorch are easy
to write, but suffer from high interpretive overhead and are not easily
deployable in production or mobile settings. Graph-based libraries like
TensorFlow and Theano benefit from whole-program optimization and can be
deployed broadly, but make expressing complex models more cumbersome. We
describe how the use of staged programming in Python, via source code
transformation, offers a midpoint between these two library design patterns,
capturing the benefits of both. A key insight is to delay all type-dependent
decisions until runtime, via dynamic dispatch. We instantiate these principles
in AutoGraph, a software system that improves the programming experience of the
TensorFlow library, and demonstrate usability improvements with no loss in
performance compared to native TensorFlow graphs. We also show that our system
is backend agnostic, and demonstrate targeting an alternate IR with
characteristics not found in TensorFlow graphs
Efficient CHAD
We show how the basic Combinatory Homomorphic Automatic Differentiation
(CHAD) algorithm can be optimised, using well-known methods, to yield a simple
and generally applicable reverse-mode automatic differentiation (AD) technique
that has the correct computational complexity that we would expect of a reverse
AD algorithm. Specifically, we show that the standard optimisations of sparse
vectors and state-passing style code (as well as defunctionalisation/closure
conversion, for higher-order languages) give us a purely functional algorithm
that is most of the way to the correct complexity, with (functional) mutable
updates taking care of the final log-factors. We provide an Agda formalisation
of our complexity proof. Finally, we discuss how the techniques apply to
differentiating parallel functional programs: the key observations are 1) that
all required mutability is (commutative, associative) accumulation, which lets
us preserve task-parallelism and 2) that we can write down data-parallel
derivatives for most data-parallel array primitives
Verifying an Effect-Handler-Based Define-By-Run Reverse-Mode AD Library
We apply program verification technology to the problem of specifying and
verifying automatic differentiation (AD) algorithms. We focus on
``define-by-run'', a style of AD where the program that must be differentiated
is executed and monitored by the automatic differentiation algorithm. We begin
by asking, ``what is an implementation of AD?'' and ``what does it mean for an
implementation of AD to be correct?'' We answer these questions both at an
informal level, in precise English prose, and at a formal level, using types
and logical assertions. After answering these broad questions, we focus on a
specific implementation of AD, which involves a number of subtle programming
language features, including dynamically allocated mutable state, first-class
functions, and effect handlers. We present a machine-checked proof, expressed
in a modern variant of Separation Logic, of its correctness. We view this
result as an advanced exercise in program verification, with potential future
applications to the verification of more realistic automatic differentiation
systems and of other software components that exploit delimited control
effects
Efficient Dual-Numbers Reverse AD via Well-Known Program Transformations
Where dual-numbers forward-mode automatic differentiation (AD) pairs each scalar value with its tangent value, dual-numbers \emph{reverse-mode} AD attempts to achieve reverse AD using a similarly simple idea: by pairing each scalar value with a backpropagator function. Its correctness and efficiency on higher-order input languages have been analysed by Brunel, Mazza and Pagani, but this analysis used a custom operational semantics for which it is unclear whether it can be implemented efficiently. We take inspiration from their use of \emph{linear factoring} to optimise dual-numbers reverse-mode AD to an algorithm that has the correct complexity and enjoys an efficient implementation in a standard functional language with support for mutable arrays, such as Haskell. Aside from the linear factoring ingredient, our optimisation steps consist of well-known ideas from the functional programming community. We demonstrate the practical use of our technique by providing a performant implementation that differentiates most of Haskell98
Dual-Numbers Reverse AD, Efficiently
Where dual-numbers forward-mode automatic differentiation (AD) pairs each scalar value with its tangent derivative, dual-numbers /reverse-mode/ AD attempts to achieve reverse AD using a similarly simple idea: by pairing each scalar value with a backpropagator function. Its correctness and efficiency on higher-order input languages have been analysed by Brunel, Mazza and Pagani, but this analysis was on a custom operational semantics for which it is unclear whether it can be implemented efficiently. We take inspiration from their use of /linear factoring/ to optimise dual-numbers reverse-mode AD to an algorithm that has the correct complexity and enjoys an efficient implementation in a standard functional language with resource-linear types, such as Haskell. Aside from the linear factoring ingredient, our optimisation steps consist of well-known ideas from the functional programming community. Furthermore, we observe a connection with classical imperative taping-based reverse AD, as well as Kmett's 'ad' Haskell library, recently analysed by Krawiec et al. We demonstrate the practical use of our technique by providing a performant implementation that differentiates most of Haskell98
Verifying an Effect-Handler-Based Define-By-Run Reverse-Mode AD Library
We apply program verification technology to the problem of specifying and
verifying automatic differentiation (AD) algorithms. We focus on define-by-run,
a style of AD where the program that must be differentiated is executed and
monitored by the automatic differentiation algorithm. We begin by asking, "what
is an implementation of AD?" and "what does it mean for an implementation of AD
to be correct?" We answer these questions both at an informal level, in precise
English prose, and at a formal level, using types and logical assertions. After
answering these broad questions, we focus on a specific implementation of AD,
which involves a number of subtle programming-language features, including
dynamically allocated mutable state, first-class functions, and effect
handlers. We present a machine-checked proof, expressed in a modern variant of
Separation Logic, of its correctness. We view this result as an advanced
exercise in program verification, with potential future applications to the
verification of more realistic automatic differentiation systems and of other
software components that exploit delimited-control effects