17,425 research outputs found
Automatic differentiation in machine learning: a survey
Derivatives, mostly in the form of gradients and Hessians, are ubiquitous in
machine learning. Automatic differentiation (AD), also called algorithmic
differentiation or simply "autodiff", is a family of techniques similar to but
more general than backpropagation for efficiently and accurately evaluating
derivatives of numeric functions expressed as computer programs. AD is a small
but established field with applications in areas including computational fluid
dynamics, atmospheric sciences, and engineering design optimization. Until very
recently, the fields of machine learning and AD have largely been unaware of
each other and, in some cases, have independently discovered each other's
results. Despite its relevance, general-purpose AD has been missing from the
machine learning toolbox, a situation slowly changing with its ongoing adoption
under the names "dynamic computational graphs" and "differentiable
programming". We survey the intersection of AD and machine learning, cover
applications where AD has direct relevance, and address the main implementation
techniques. By precisely defining the main differentiation techniques and their
interrelationships, we aim to bring clarity to the usage of the terms
"autodiff", "automatic differentiation", and "symbolic differentiation" as
these are encountered more and more in machine learning settings.Comment: 43 pages, 5 figure
Automatic Differentiation of Algorithms for Machine Learning
Automatic differentiation---the mechanical transformation of numeric computer
programs to calculate derivatives efficiently and accurately---dates to the
origin of the computer age. Reverse mode automatic differentiation both
antedates and generalizes the method of backwards propagation of errors used in
machine learning. Despite this, practitioners in a variety of fields, including
machine learning, have been little influenced by automatic differentiation, and
make scant use of available tools. Here we review the technique of automatic
differentiation, describe its two main modes, and explain how it can benefit
machine learning practitioners. To reach the widest possible audience our
treatment assumes only elementary differential calculus, and does not assume
any knowledge of linear algebra.Comment: 7 pages, 1 figur
Sensitivity analysis and parameter estimation for distributed hydrological modeling: potential of variational methods
Variational methods are widely used for the analysis and control of computationally intensive spatially distributed systems. In particular, the adjoint state method enables a very efficient calculation of the derivatives of an objective function (response function to be analysed or cost function to be optimised) with respect to model inputs. In this contribution, it is shown that the potential of variational methods for distributed catchment scale hydrology should be considered. A distributed flash flood model, coupling kinematic wave overland flow and Green Ampt infiltration, is applied to a small catchment of the Thoré basin and used as a relatively simple (synthetic observations) but didactic application case. It is shown that forward and adjoint sensitivity analysis provide a local but extensive insight on the relation between the assigned model parameters and the simulated hydrological response. Spatially distributed parameter sensitivities can be obtained for a very modest calculation effort (~6 times the computing time of a single model run) and the singular value decomposition (SVD) of the Jacobian matrix provides an interesting perspective for the analysis of the rainfall-runoff relation. For the estimation of model parameters, adjoint-based derivatives were found exceedingly efficient in driving a bound-constrained quasi-Newton algorithm. The reference parameter set is retrieved independently from the optimization initial condition when the very common dimension reduction strategy (i.e. scalar multipliers) is adopted. Furthermore, the sensitivity analysis results suggest that most of the variability in this high-dimensional parameter space can be captured with a few orthogonal directions. A parametrization based on the SVD leading singular vectors was found very promising but should be combined with another regularization strategy in order to prevent overfitting
Automatic implementation of material laws: Jacobian calculation in a finite element code with TAPENADE
In an effort to increase the versatility of finite element codes, we explore
the possibility of automatically creating the Jacobian matrix necessary for the
gradient-based solution of nonlinear systems of equations. Particularly, we aim
to assess the feasibility of employing the automatic differentiation tool
TAPENADE for this purpose on a large Fortran codebase that is the result of
many years of continuous development. As a starting point we will describe the
special structure of finite element codes and the implications that this code
design carries for an efficient calculation of the Jacobian matrix. We will
also propose a first approach towards improving the efficiency of such a
method. Finally, we will present a functioning method for the automatic
implementation of the Jacobian calculation in a finite element software, but
will also point out important shortcomings that will have to be addressed in
the future.Comment: 17 pages, 9 figure
An adjoint for likelihood maximization
The process of likelihood maximization can be found in many different areas of computational modelling. However, the construction of such models via likelihood maximization requires the solution of a difficult multi-modal optimization problem involving an expensive O(n3) factorization. The optimization techniques used to solve this problem may require many such factorizations and can result in a significant bottle-neck. This article derives an adjoint formulation of the likelihood employed in the construction of a kriging model via reverse algorithmic differentiation. This adjoint is found to calculate the likelihood and all of its derivatives more efficiently than the standard analytical method and can therefore be utilised within a simple local search or within a hybrid global optimization to accelerate convergence and therefore reduce the cost of the likelihood optimization
Fluctuation-Response Relations for Multi-Time Correlations
We show that time-correlation functions of arbitrary order for any random
variable in a statistical dynamical system can be calculated as higher-order
response functions of the mean history of the variable. The response is to a
``control term'' added as a modification to the master equation for statistical
distributions. The proof of the relations is based upon a variational
characterization of the generating functional of the time-correlations. The
same fluctuation-response relations are preserved within moment-closures for
the statistical dynamical system, when these are constructed via the
variational Rayleigh-Ritz procedure. For the 2-time correlations of the
moment-variables themselves, the fluctuation-response relation is equivalent to
an ``Onsager regression hypothesis'' for the small fluctuations. For
correlations of higher-order, there is a new effect in addition to such linear
propagation of fluctuations present instantaneously: the dynamical generation
of correlations by nonlinear interaction of fluctuations. In general, we
discuss some physical and mathematical aspects of the {\it Ans\"{a}tze}
required for an accurate calculation of the time correlations. We also comment
briefly upon the computational use of these relations, which is well-suited for
automatic differentiation tools. An example will be given of a simple closure
for turbulent energy decay, which illustrates the numerical application of the
relations.Comment: 28 pages, 1 figure, submitted to Phys. Rev.
- …