11,760 research outputs found
Forward-Mode Automatic Differentiation in Julia
We present ForwardDiff, a Julia package for forward-mode automatic
differentiation (AD) featuring performance competitive with low-level languages
like C++. Unlike recently developed AD tools in other popular high-level
languages such as Python and MATLAB, ForwardDiff takes advantage of
just-in-time (JIT) compilation to transparently recompile AD-unaware user code,
enabling efficient support for higher-order differentiation and differentiation
using custom number types (including complex numbers). For gradient and
Jacobian calculations, ForwardDiff provides a variant of vector-forward mode
that avoids expensive heap allocation and makes better use of memory bandwidth
than traditional vector mode. In our numerical experiments, we demonstrate that
for nontrivially large dimensions, ForwardDiff's gradient computations can be
faster than a reverse-mode implementation from the Python-based autograd
package. We also illustrate how ForwardDiff is used effectively within JuMP, a
modeling language for optimization. According to our usage statistics, 41
unique repositories on GitHub depend on ForwardDiff, with users from diverse
fields such as astronomy, optimization, finite element analysis, and
statistics.
This document is an extended abstract that has been accepted for presentation
at the AD2016 7th International Conference on Algorithmic Differentiation.Comment: 4 page
Dynamic automatic differentiation of GPU broadcast kernels
We show how forward-mode automatic differentiation (AD) can be employed within larger reverse-mode computations to dynamically differentiate broadcast operations in a GPU-friendly manner. Our technique fully exploits the broadcast Jacobian's inherent sparsity structure, and unlike a pure reverse-mode approach, this "mixed-mode" approach does not require a backwards pass over the broadcasted operation's subgraph, obviating the need for several reverse-mode-specific programmability restrictions on user-authored broadcast operations. Most notably, this approach allows broadcast fusion in primal code despite the presence of data-dependent control flow. We discuss an experiment in which a Julia implementation of our technique outperformed pure reverse-mode TensorFlow and Julia implementations for differentiating through broadcast operations within an HM-LSTM cell update calculation
Automatic differentiation in machine learning: a survey
Derivatives, mostly in the form of gradients and Hessians, are ubiquitous in
machine learning. Automatic differentiation (AD), also called algorithmic
differentiation or simply "autodiff", is a family of techniques similar to but
more general than backpropagation for efficiently and accurately evaluating
derivatives of numeric functions expressed as computer programs. AD is a small
but established field with applications in areas including computational fluid
dynamics, atmospheric sciences, and engineering design optimization. Until very
recently, the fields of machine learning and AD have largely been unaware of
each other and, in some cases, have independently discovered each other's
results. Despite its relevance, general-purpose AD has been missing from the
machine learning toolbox, a situation slowly changing with its ongoing adoption
under the names "dynamic computational graphs" and "differentiable
programming". We survey the intersection of AD and machine learning, cover
applications where AD has direct relevance, and address the main implementation
techniques. By precisely defining the main differentiation techniques and their
interrelationships, we aim to bring clarity to the usage of the terms
"autodiff", "automatic differentiation", and "symbolic differentiation" as
these are encountered more and more in machine learning settings.Comment: 43 pages, 5 figure
Competitive Gradient Descent
We introduce a new algorithm for the numerical computation of Nash equilibria
of competitive two-player games. Our method is a natural generalization of
gradient descent to the two-player setting where the update is given by the
Nash equilibrium of a regularized bilinear local approximation of the
underlying game. It avoids oscillatory and divergent behaviors seen in
alternating gradient descent. Using numerical experiments and rigorous
analysis, we provide a detailed comparison to methods based on \emph{optimism}
and \emph{consensus} and show that our method avoids making any unnecessary
changes to the gradient dynamics while achieving exponential (local)
convergence for (locally) convex-concave zero sum games. Convergence and
stability properties of our method are robust to strong interactions between
the players, without adapting the stepsize, which is not the case with previous
methods. In our numerical experiments on non-convex-concave problems, existing
methods are prone to divergence and instability due to their sensitivity to
interactions among the players, whereas we never observe divergence of our
algorithm. The ability to choose larger stepsizes furthermore allows our
algorithm to achieve faster convergence, as measured by the number of model
evaluations.Comment: Appeared in NeurIPS 2019. This version corrects an error in theorem
2.2. Source code used for the numerical experiments can be found under
http://github.com/f-t-s/CGD. A high-level overview of this work can be found
under http://f-t-s.github.io/projects/cgd
- …