78,795 research outputs found

    Automatic differentiation in machine learning: a survey

    Get PDF
    Derivatives, mostly in the form of gradients and Hessians, are ubiquitous in machine learning. Automatic differentiation (AD), also called algorithmic differentiation or simply "autodiff", is a family of techniques similar to but more general than backpropagation for efficiently and accurately evaluating derivatives of numeric functions expressed as computer programs. AD is a small but established field with applications in areas including computational fluid dynamics, atmospheric sciences, and engineering design optimization. Until very recently, the fields of machine learning and AD have largely been unaware of each other and, in some cases, have independently discovered each other's results. Despite its relevance, general-purpose AD has been missing from the machine learning toolbox, a situation slowly changing with its ongoing adoption under the names "dynamic computational graphs" and "differentiable programming". We survey the intersection of AD and machine learning, cover applications where AD has direct relevance, and address the main implementation techniques. By precisely defining the main differentiation techniques and their interrelationships, we aim to bring clarity to the usage of the terms "autodiff", "automatic differentiation", and "symbolic differentiation" as these are encountered more and more in machine learning settings.Comment: 43 pages, 5 figure

    Automatic linearity detection

    Get PDF
    Given a function, or more generally an operator, the question "Is it linear?" seems simple to answer. In many applications of scientific computing it might be worth determining the answer to this question in an automated way; some functionality, such as operator exponentiation, is only defined for linear operators, and in other problems, time saving is available if it is known that the problem being solved is linear. Linearity detection is closely connected to sparsity detection of Hessians, so for large-scale applications, memory savings can be made if linearity information is known. However, implementing such an automated detection is not as straightforward as one might expect. This paper describes how automatic linearity detection can be implemented in combination with automatic differentiation, both for standard scientific computing software, and within the Chebfun software system. The key ingredients for the method are the observation that linear operators have constant derivatives, and the propagation of two logical vectors, â„“\ell and cc, as computations are carried out. The values of â„“\ell and cc are determined by whether output variables have constant derivatives and constant values with respect to each input variable. The propagation of their values through an evaluation trace of an operator yields the desired information about the linearity of that operator

    Differentiable Programming Tensor Networks

    Full text link
    Differentiable programming is a fresh programming paradigm which composes parameterized algorithmic components and trains them using automatic differentiation (AD). The concept emerges from deep learning but is not only limited to training neural networks. We present theory and practice of programming tensor network algorithms in a fully differentiable way. By formulating the tensor network algorithm as a computation graph, one can compute higher order derivatives of the program accurately and efficiently using AD. We present essential techniques to differentiate through the tensor networks contractions, including stable AD for tensor decomposition and efficient backpropagation through fixed point iterations. As a demonstration, we compute the specific heat of the Ising model directly by taking the second order derivative of the free energy obtained in the tensor renormalization group calculation. Next, we perform gradient based variational optimization of infinite projected entangled pair states for quantum antiferromagnetic Heisenberg model and obtain start-of-the-art variational energy and magnetization with moderate efforts. Differentiable programming removes laborious human efforts in deriving and implementing analytical gradients for tensor network programs, which opens the door to more innovations in tensor network algorithms and applications.Comment: Typos corrected, discussion and refs added; revised version accepted for publication in PRX. Source code available at https://github.com/wangleiphy/tensorgra

    Automating embedded analysis capabilities and managing software complexity in multiphysics simulation part I: template-based generic programming

    Full text link
    An approach for incorporating embedded simulation and analysis capabilities in complex simulation codes through template-based generic programming is presented. This approach relies on templating and operator overloading within the C++ language to transform a given calculation into one that can compute a variety of additional quantities that are necessary for many state-of-the-art simulation and analysis algorithms. An approach for incorporating these ideas into complex simulation codes through general graph-based assembly is also presented. These ideas have been implemented within a set of packages in the Trilinos framework and are demonstrated on a simple problem from chemical engineering

    ADF95: Tool for automatic differentiation of a FORTRAN code designed for large numbers of independent variables

    Full text link
    ADF95 is a tool to automatically calculate numerical first derivatives for any mathematical expression as a function of user defined independent variables. Accuracy of derivatives is achieved within machine precision. ADF95 may be applied to any FORTRAN 77/90/95 conforming code and requires minimal changes by the user. It provides a new derived data type that holds the value and derivatives and applies forward differencing by overloading all FORTRAN operators and intrinsic functions. An efficient indexing technique leads to a reduced memory usage and a substantially increased performance gain over other available tools with operator overloading. This gain is especially pronounced for sparse systems with large number of independent variables. A wide class of numerical simulations, e.g., those employing implicit solvers, can profit from ADF95.Comment: 24 pages, 2 figures, 4 tables, accepted in Computer Physics Communication
    • …
    corecore