Search CORE

11,562 research outputs found

Ensemble Kalman Inversion: A Derivative-Free Technique For Machine Learning Tasks

Author: Kovachki Nikola B.
Stuart Andrew M.
Publication venue: 'IOP Publishing'
Publication date: 10/08/2018
Field of study

The standard probabilistic perspective on machine learning gives rise to empirical risk-minimization tasks that are frequently solved by stochastic gradient descent (SGD) and variants thereof. We present a formulation of these tasks as classical inverse or filtering problems and, furthermore, we propose an efficient, gradient-free algorithm for finding a solution to these problems using ensemble Kalman inversion (EKI). Applications of our approach include offline and online supervised learning with deep neural networks, as well as graph-based semi-supervised learning. The essence of the EKI procedure is an ensemble based approximate gradient descent in which derivatives are replaced by differences from within the ensemble. We suggest several modifications to the basic method, derived from empirically successful heuristics developed in the context of SGD. Numerical results demonstrate wide applicability and robustness of the proposed algorithm.Comment: 41 pages, 14 figure

arXiv.org e-Print Archive

Caltech Authors

Small steps and giant leaps: Minimal Newton solvers for Deep Learning

Author: Albanie Samuel
Ehrhardt Sebastien
Henriques João F.
Vedaldi Andrea
Publication venue
Publication date: 21/05/2018
Field of study

We propose a fast second-order method that can be used as a drop-in replacement for current deep learning solvers. Compared to stochastic gradient descent (SGD), it only requires two additional forward-mode automatic differentiation operations per iteration, which has a computational cost comparable to two standard forward passes and is easy to implement. Our method addresses long-standing issues with current second-order solvers, which invert an approximate Hessian matrix every iteration exactly or by conjugate-gradient methods, a procedure that is both costly and sensitive to noise. Instead, we propose to keep a single estimate of the gradient projected by the inverse Hessian matrix, and update it once per iteration. This estimate has the same size and is similar to the momentum variable that is commonly used in SGD. No estimate of the Hessian is maintained. We first validate our method, called CurveBall, on small problems with known closed-form solutions (noisy Rosenbrock function and degenerate 2-layer linear networks), where current deep learning solvers seem to struggle. We then train several large models on CIFAR and ImageNet, including ResNet and VGG-f networks, where we demonstrate faster convergence with no hyperparameter tuning. Code is available

arXiv.org e-Print Archive

Crossref

Oxford University Research Archive

Recurrent backpropagation and the dynamical approach to adaptive neural computation

Author: Pineda Fernando J.
Publication venue: 'MIT Press - Journals'
Publication date: 01/06/1989
Field of study

Error backpropagation in feedforward neural network models is a popular learning algorithm that has its roots in nonlinear estimation and optimization. It is being used routinely to calculate error gradients in nonlinear systems with hundreds of thousands of parameters. However, the classical architecture for backpropagation has severe restrictions. The extension of backpropagation to networks with recurrent connections will be reviewed. It is now possible to efficiently compute the error gradients for networks that have temporal dynamics, which opens applications to a host of problems in systems identification and control

Caltech Authors

Characterizing Evaporation Ducts Within the Marine Atmospheric Boundary Layer Using Artificial Neural Networks

Author: Earls Christopher J.
Sit Hilarie
Publication venue: 'American Geophysical Union (AGU)'
Publication date: 26/08/2019
Field of study

We apply a multilayer perceptron machine learning (ML) regression approach to infer electromagnetic (EM) duct heights within the marine atmospheric boundary layer (MABL) using sparsely sampled EM propagation data obtained within a bistatic context. This paper explains the rationale behind the selection of the ML network architecture, along with other model hyperparameters, in an effort to demystify the process of arriving at a useful ML model. The resulting speed of our ML predictions of EM duct heights, using sparse data measurements within MABL, indicates the suitability of the proposed method for real-time applications.Comment: 13 pages, 7 figure

arXiv.org e-Print Archive

eCommons@Cornell