17,425 research outputs found

    Automatic differentiation in machine learning: a survey

    Get PDF
    Derivatives, mostly in the form of gradients and Hessians, are ubiquitous in machine learning. Automatic differentiation (AD), also called algorithmic differentiation or simply "autodiff", is a family of techniques similar to but more general than backpropagation for efficiently and accurately evaluating derivatives of numeric functions expressed as computer programs. AD is a small but established field with applications in areas including computational fluid dynamics, atmospheric sciences, and engineering design optimization. Until very recently, the fields of machine learning and AD have largely been unaware of each other and, in some cases, have independently discovered each other's results. Despite its relevance, general-purpose AD has been missing from the machine learning toolbox, a situation slowly changing with its ongoing adoption under the names "dynamic computational graphs" and "differentiable programming". We survey the intersection of AD and machine learning, cover applications where AD has direct relevance, and address the main implementation techniques. By precisely defining the main differentiation techniques and their interrelationships, we aim to bring clarity to the usage of the terms "autodiff", "automatic differentiation", and "symbolic differentiation" as these are encountered more and more in machine learning settings.Comment: 43 pages, 5 figure

    Automatic Differentiation of Algorithms for Machine Learning

    Get PDF
    Automatic differentiation---the mechanical transformation of numeric computer programs to calculate derivatives efficiently and accurately---dates to the origin of the computer age. Reverse mode automatic differentiation both antedates and generalizes the method of backwards propagation of errors used in machine learning. Despite this, practitioners in a variety of fields, including machine learning, have been little influenced by automatic differentiation, and make scant use of available tools. Here we review the technique of automatic differentiation, describe its two main modes, and explain how it can benefit machine learning practitioners. To reach the widest possible audience our treatment assumes only elementary differential calculus, and does not assume any knowledge of linear algebra.Comment: 7 pages, 1 figur

    Sensitivity analysis and parameter estimation for distributed hydrological modeling: potential of variational methods

    Get PDF
    Variational methods are widely used for the analysis and control of computationally intensive spatially distributed systems. In particular, the adjoint state method enables a very efficient calculation of the derivatives of an objective function (response function to be analysed or cost function to be optimised) with respect to model inputs. In this contribution, it is shown that the potential of variational methods for distributed catchment scale hydrology should be considered. A distributed flash flood model, coupling kinematic wave overland flow and Green Ampt infiltration, is applied to a small catchment of the Thoré basin and used as a relatively simple (synthetic observations) but didactic application case. It is shown that forward and adjoint sensitivity analysis provide a local but extensive insight on the relation between the assigned model parameters and the simulated hydrological response. Spatially distributed parameter sensitivities can be obtained for a very modest calculation effort (~6 times the computing time of a single model run) and the singular value decomposition (SVD) of the Jacobian matrix provides an interesting perspective for the analysis of the rainfall-runoff relation. For the estimation of model parameters, adjoint-based derivatives were found exceedingly efficient in driving a bound-constrained quasi-Newton algorithm. The reference parameter set is retrieved independently from the optimization initial condition when the very common dimension reduction strategy (i.e. scalar multipliers) is adopted. Furthermore, the sensitivity analysis results suggest that most of the variability in this high-dimensional parameter space can be captured with a few orthogonal directions. A parametrization based on the SVD leading singular vectors was found very promising but should be combined with another regularization strategy in order to prevent overfitting

    Automatic implementation of material laws: Jacobian calculation in a finite element code with TAPENADE

    Full text link
    In an effort to increase the versatility of finite element codes, we explore the possibility of automatically creating the Jacobian matrix necessary for the gradient-based solution of nonlinear systems of equations. Particularly, we aim to assess the feasibility of employing the automatic differentiation tool TAPENADE for this purpose on a large Fortran codebase that is the result of many years of continuous development. As a starting point we will describe the special structure of finite element codes and the implications that this code design carries for an efficient calculation of the Jacobian matrix. We will also propose a first approach towards improving the efficiency of such a method. Finally, we will present a functioning method for the automatic implementation of the Jacobian calculation in a finite element software, but will also point out important shortcomings that will have to be addressed in the future.Comment: 17 pages, 9 figure

    An adjoint for likelihood maximization

    No full text
    The process of likelihood maximization can be found in many different areas of computational modelling. However, the construction of such models via likelihood maximization requires the solution of a difficult multi-modal optimization problem involving an expensive O(n3) factorization. The optimization techniques used to solve this problem may require many such factorizations and can result in a significant bottle-neck. This article derives an adjoint formulation of the likelihood employed in the construction of a kriging model via reverse algorithmic differentiation. This adjoint is found to calculate the likelihood and all of its derivatives more efficiently than the standard analytical method and can therefore be utilised within a simple local search or within a hybrid global optimization to accelerate convergence and therefore reduce the cost of the likelihood optimization

    Fluctuation-Response Relations for Multi-Time Correlations

    Full text link
    We show that time-correlation functions of arbitrary order for any random variable in a statistical dynamical system can be calculated as higher-order response functions of the mean history of the variable. The response is to a ``control term'' added as a modification to the master equation for statistical distributions. The proof of the relations is based upon a variational characterization of the generating functional of the time-correlations. The same fluctuation-response relations are preserved within moment-closures for the statistical dynamical system, when these are constructed via the variational Rayleigh-Ritz procedure. For the 2-time correlations of the moment-variables themselves, the fluctuation-response relation is equivalent to an ``Onsager regression hypothesis'' for the small fluctuations. For correlations of higher-order, there is a new effect in addition to such linear propagation of fluctuations present instantaneously: the dynamical generation of correlations by nonlinear interaction of fluctuations. In general, we discuss some physical and mathematical aspects of the {\it Ans\"{a}tze} required for an accurate calculation of the time correlations. We also comment briefly upon the computational use of these relations, which is well-suited for automatic differentiation tools. An example will be given of a simple closure for turbulent energy decay, which illustrates the numerical application of the relations.Comment: 28 pages, 1 figure, submitted to Phys. Rev.
    corecore