78,803 research outputs found
Automatic differentiation in machine learning: a survey
Derivatives, mostly in the form of gradients and Hessians, are ubiquitous in
machine learning. Automatic differentiation (AD), also called algorithmic
differentiation or simply "autodiff", is a family of techniques similar to but
more general than backpropagation for efficiently and accurately evaluating
derivatives of numeric functions expressed as computer programs. AD is a small
but established field with applications in areas including computational fluid
dynamics, atmospheric sciences, and engineering design optimization. Until very
recently, the fields of machine learning and AD have largely been unaware of
each other and, in some cases, have independently discovered each other's
results. Despite its relevance, general-purpose AD has been missing from the
machine learning toolbox, a situation slowly changing with its ongoing adoption
under the names "dynamic computational graphs" and "differentiable
programming". We survey the intersection of AD and machine learning, cover
applications where AD has direct relevance, and address the main implementation
techniques. By precisely defining the main differentiation techniques and their
interrelationships, we aim to bring clarity to the usage of the terms
"autodiff", "automatic differentiation", and "symbolic differentiation" as
these are encountered more and more in machine learning settings.Comment: 43 pages, 5 figure
Automatic linearity detection
Given a function, or more generally an operator, the question "Is it linear?" seems simple to answer. In many applications of scientific computing it might be worth determining the answer to this question in an automated way; some functionality, such as operator exponentiation, is only defined for linear operators, and in other problems, time saving is available if it is known that the problem being solved is linear. Linearity detection is closely connected to sparsity detection of Hessians, so for large-scale applications, memory savings can be made if linearity information is known. However, implementing such an automated detection is not as straightforward as one might expect. This paper describes how automatic linearity detection can be implemented in combination with automatic differentiation, both for standard scientific computing software, and within the Chebfun software system. The key ingredients for the method are the observation that linear operators have constant derivatives, and the propagation of two logical vectors, and , as computations are carried out. The values of and are determined by whether output variables have constant derivatives and constant values with respect to each input variable. The propagation of their values through an evaluation trace of an operator yields the desired information about the linearity of that operator
Differentiable Programming Tensor Networks
Differentiable programming is a fresh programming paradigm which composes
parameterized algorithmic components and trains them using automatic
differentiation (AD). The concept emerges from deep learning but is not only
limited to training neural networks. We present theory and practice of
programming tensor network algorithms in a fully differentiable way. By
formulating the tensor network algorithm as a computation graph, one can
compute higher order derivatives of the program accurately and efficiently
using AD. We present essential techniques to differentiate through the tensor
networks contractions, including stable AD for tensor decomposition and
efficient backpropagation through fixed point iterations. As a demonstration,
we compute the specific heat of the Ising model directly by taking the second
order derivative of the free energy obtained in the tensor renormalization
group calculation. Next, we perform gradient based variational optimization of
infinite projected entangled pair states for quantum antiferromagnetic
Heisenberg model and obtain start-of-the-art variational energy and
magnetization with moderate efforts. Differentiable programming removes
laborious human efforts in deriving and implementing analytical gradients for
tensor network programs, which opens the door to more innovations in tensor
network algorithms and applications.Comment: Typos corrected, discussion and refs added; revised version accepted
for publication in PRX. Source code available at
https://github.com/wangleiphy/tensorgra
Automating embedded analysis capabilities and managing software complexity in multiphysics simulation part I: template-based generic programming
An approach for incorporating embedded simulation and analysis capabilities
in complex simulation codes through template-based generic programming is
presented. This approach relies on templating and operator overloading within
the C++ language to transform a given calculation into one that can compute a
variety of additional quantities that are necessary for many state-of-the-art
simulation and analysis algorithms. An approach for incorporating these ideas
into complex simulation codes through general graph-based assembly is also
presented. These ideas have been implemented within a set of packages in the
Trilinos framework and are demonstrated on a simple problem from chemical
engineering
ADF95: Tool for automatic differentiation of a FORTRAN code designed for large numbers of independent variables
ADF95 is a tool to automatically calculate numerical first derivatives for
any mathematical expression as a function of user defined independent
variables. Accuracy of derivatives is achieved within machine precision. ADF95
may be applied to any FORTRAN 77/90/95 conforming code and requires minimal
changes by the user. It provides a new derived data type that holds the value
and derivatives and applies forward differencing by overloading all FORTRAN
operators and intrinsic functions. An efficient indexing technique leads to a
reduced memory usage and a substantially increased performance gain over other
available tools with operator overloading. This gain is especially pronounced
for sparse systems with large number of independent variables. A wide class of
numerical simulations, e.g., those employing implicit solvers, can profit from
ADF95.Comment: 24 pages, 2 figures, 4 tables, accepted in Computer Physics
Communication
- …