10 research outputs found
Never look back - A modified EnKF method and its application to the training of neural networks without back propagation
In this work, we present a new derivative-free optimization method and investigate its use for training neural networks. Our method is motivated by the Ensemble Kalman Filter (EnKF), which has been used successfully for solving optimization problems that involve large-scale, highly nonlinear dynamical systems. A key benefit of the EnKF method is that it requires only the evaluation of the forward propagation but not its derivatives. Hence, in the context of neural networks, it alleviates the need for back propagation and reduces the memory consumption dramatically. However, the method is not a pure "black-box" global optimization heuristic as it efficiently utilizes the structure of typical learning problems. Promising first results of the EnKF for training deep neural networks have been presented recently by Kovachki and Stuart. We propose an important modification of the EnKF that enables us to prove convergence of our method to the minimizer of a strongly convex function. Our method also bears similarity with implicit filtering and we demonstrate its potential for minimizing highly oscillatory functions using a simple example. Further, we provide numerical examples that demonstrate the potential of our method for training deep neural networks
Never look back - A modified EnKF method and its application to the training of neural networks without back propagation
In this work, we present a new derivative-free optimization method and
investigate its use for training neural networks. Our method is motivated by
the Ensemble Kalman Filter (EnKF), which has been used successfully for solving
optimization problems that involve large-scale, highly nonlinear dynamical
systems. A key benefit of the EnKF method is that it requires only the
evaluation of the forward propagation but not its derivatives. Hence, in the
context of neural networks, it alleviates the need for back propagation and
reduces the memory consumption dramatically. However, the method is not a pure
"black-box" global optimization heuristic as it efficiently utilizes the
structure of typical learning problems. Promising first results of the EnKF for
training deep neural networks have been presented recently by Kovachki and
Stuart. We propose an important modification of the EnKF that enables us to
prove convergence of our method to the minimizer of a strongly convex function.
Our method also bears similarity with implicit filtering and we demonstrate its
potential for minimizing highly oscillatory functions using a simple example.
Further, we provide numerical examples that demonstrate the potential of our
method for training deep neural networks.Comment: 10 pages, 2 figure
Ensemble Kalman filter for neural network based one-shot inversion
We study the use of novel techniques arising in machine learning for inverse
problems. Our approach replaces the complex forward model by a neural network,
which is trained simultaneously in a one-shot sense when estimating the unknown
parameters from data, i.e. the neural network is trained only for the unknown
parameter. By establishing a link to the Bayesian approach to inverse problems,
an algorithmic framework is developed which ensures the feasibility of the
parameter estimate w.r. to the forward model. We propose an efficient,
derivative-free optimization method based on variants of the ensemble Kalman
inversion. Numerical experiments show that the ensemble Kalman filter for
neural network based one-shot inversion is a promising direction combining
optimization and machine learning techniques for inverse problems
A Stabilization of a Continuous Limit of the Ensemble Kalman Filter
The ensemble Kalman filter belongs to the class of iterative particle
filtering methods and can be used for solving control--to--observable inverse
problems. In recent years several continuous limits in the number of iteration
and particles have been performed in order to study properties of the method.
In particular, a one--dimensional linear stability analysis reveals a possible
instability of the solution provided by the continuous--time limit of the
ensemble Kalman filter for inverse problems. In this work we address this issue
by introducing a stabilization of the dynamics which leads to a method with
globally asymptotically stable solutions. We illustrate the performance of the
stabilized version of the ensemble Kalman filter by using test inverse problems
from the literature and comparing it with the classical formulation of the
method
Never look back - A modified EnKF method and its application to the training of neural networks without back propagation
In this work, we present a new derivative-free optimization method and investigate its use for training neural networks. Our method is motivated by the Ensemble Kalman Filter (EnKF), which has been used successfully for solving optimization problems that involve large-scale, highly nonlinear dynamical systems. A key benefit of the EnKF method is that it requires only the evaluation of the forward propagation but not its derivatives. Hence, in the context of neural networks, it alleviates the need for back propagation and reduces the memory consumption dramatically. However, the method is not a pure "black-box" global optimization heuristic as it efficiently utilizes the structure of typical learning problems. Promising first results of the EnKF for training deep neural networks have been presented recently by Kovachki and Stuart. We propose an important modification of the EnKF that enables us to prove convergence of our method to the minimizer of a strongly convex function. Our method also bears similarity with implicit filtering and we demonstrate its potential for minimizing highly oscillatory functions using a simple example. Further, we provide numerical examples that demonstrate the potential of our method for training deep neural networks
Ensemble Kalman Inversion: A Derivative-Free Technique For Machine Learning Tasks
The standard probabilistic perspective on machine learning gives rise to
empirical risk-minimization tasks that are frequently solved by stochastic
gradient descent (SGD) and variants thereof. We present a formulation of these
tasks as classical inverse or filtering problems and, furthermore, we propose
an efficient, gradient-free algorithm for finding a solution to these problems
using ensemble Kalman inversion (EKI). Applications of our approach include
offline and online supervised learning with deep neural networks, as well as
graph-based semi-supervised learning. The essence of the EKI procedure is an
ensemble based approximate gradient descent in which derivatives are replaced
by differences from within the ensemble. We suggest several modifications to
the basic method, derived from empirically successful heuristics developed in
the context of SGD. Numerical results demonstrate wide applicability and
robustness of the proposed algorithm.Comment: 41 pages, 14 figure
Ensemble Kalman Inversion: A Derivative-Free Technique For Machine Learning Tasks
The standard probabilistic perspective on machine learning gives rise to empirical risk-minimization tasks that are frequently solved by stochastic gradient descent (SGD) and variants thereof. We present a formulation of these tasks as classical inverse or filtering problems and, furthermore, we propose an efficient, gradient-free algorithm for finding a solution to these problems using ensemble Kalman inversion (EKI). The method is inherently parallelizable and is applicable to problems with non-differentiable loss functions, for which back-propagation is not possible. Applications of our approach include offline and online supervised learning with deep neural networks, as well as graph-based semi-supervised learning. The essence of the EKI procedure is an ensemble based approximate gradient descent in which derivatives are replaced by differences from within the ensemble. We suggest several modifications to the basic method, derived from empirically successful heuristics developed in the context of SGD. Numerical results demonstrate wide applicability and robustness of the proposed algorithm