101 research outputs found
Never look back - A modified EnKF method and its application to the training of neural networks without back propagation
In this work, we present a new derivative-free optimization method and
investigate its use for training neural networks. Our method is motivated by
the Ensemble Kalman Filter (EnKF), which has been used successfully for solving
optimization problems that involve large-scale, highly nonlinear dynamical
systems. A key benefit of the EnKF method is that it requires only the
evaluation of the forward propagation but not its derivatives. Hence, in the
context of neural networks, it alleviates the need for back propagation and
reduces the memory consumption dramatically. However, the method is not a pure
"black-box" global optimization heuristic as it efficiently utilizes the
structure of typical learning problems. Promising first results of the EnKF for
training deep neural networks have been presented recently by Kovachki and
Stuart. We propose an important modification of the EnKF that enables us to
prove convergence of our method to the minimizer of a strongly convex function.
Our method also bears similarity with implicit filtering and we demonstrate its
potential for minimizing highly oscillatory functions using a simple example.
Further, we provide numerical examples that demonstrate the potential of our
method for training deep neural networks.Comment: 10 pages, 2 figure
Never look back - A modified EnKF method and its application to the training of neural networks without back propagation
In this work, we present a new derivative-free optimization method and investigate its use for training neural networks. Our method is motivated by the Ensemble Kalman Filter (EnKF), which has been used successfully for solving optimization problems that involve large-scale, highly nonlinear dynamical systems. A key benefit of the EnKF method is that it requires only the evaluation of the forward propagation but not its derivatives. Hence, in the context of neural networks, it alleviates the need for back propagation and reduces the memory consumption dramatically. However, the method is not a pure "black-box" global optimization heuristic as it efficiently utilizes the structure of typical learning problems. Promising first results of the EnKF for training deep neural networks have been presented recently by Kovachki and Stuart. We propose an important modification of the EnKF that enables us to prove convergence of our method to the minimizer of a strongly convex function. Our method also bears similarity with implicit filtering and we demonstrate its potential for minimizing highly oscillatory functions using a simple example. Further, we provide numerical examples that demonstrate the potential of our method for training deep neural networks
Ensemble Kalman filter for neural network based one-shot inversion
We study the use of novel techniques arising in machine learning for inverse
problems. Our approach replaces the complex forward model by a neural network,
which is trained simultaneously in a one-shot sense when estimating the unknown
parameters from data, i.e. the neural network is trained only for the unknown
parameter. By establishing a link to the Bayesian approach to inverse problems,
an algorithmic framework is developed which ensures the feasibility of the
parameter estimate w.r. to the forward model. We propose an efficient,
derivative-free optimization method based on variants of the ensemble Kalman
inversion. Numerical experiments show that the ensemble Kalman filter for
neural network based one-shot inversion is a promising direction combining
optimization and machine learning techniques for inverse problems
Affine Invariant Ensemble Transform Methods to Improve Predictive Uncertainty in ReLU Networks
We consider the problem of performing Bayesian inference for logistic
regression using appropriate extensions of the ensemble Kalman filter. Two
interacting particle systems are proposed that sample from an approximate
posterior and prove quantitative convergence rates of these interacting
particle systems to their mean-field limit as the number of particles tends to
infinity. Furthermore, we apply these techniques and examine their
effectiveness as methods of Bayesian approximation for quantifying predictive
uncertainty in ReLU networks
EnKSGD: A Class Of Preconditioned Black Box Optimization And Inversion Algorithms
In this paper, we introduce the Ensemble Kalman-Stein Gradient Descent
(EnKSGD) class of algorithms. The EnKSGD class of algorithms builds on the
ensemble Kalman filter (EnKF) line of work, applying techniques from sequential
data assimilation to unconstrained optimization and parameter estimation
problems. The essential idea is to exploit the EnKF as a black box (i.e.
derivative-free, zeroth order) optimization tool if iterated to convergence. In
this paper, we return to the foundations of the EnKF as a sequential data
assimilation technique, including its continuous-time and mean-field limits,
with the goal of developing faster optimization algorithms suited to noisy
black box optimization and inverse problems. The resulting EnKSGD class of
algorithms can be designed to both maintain the desirable property of
affine-invariance, and employ the well-known backtracking line search.
Furthermore, EnKSGD algorithms are designed to not necessitate the subspace
restriction property and variance collapse property of previous iterated EnKF
approaches to optimization, as both these properties can be undesirable in an
optimization context. EnKSGD also generalizes beyond the loss, and is
thus applicable to a wider class of problems than the standard EnKF. Numerical
experiments with both linear and nonlinear least squares problems, as well as
maximum likelihood estimation, demonstrate the faster convergence of EnKSGD
relative to alternative EnKF approaches to optimization.Comment: 20 pages, 3 figure
Recent Trends on Nonlinear Filtering for Inverse Problems
Among the class of nonlinear particle filtering methods, the Ensemble Kalman
Filter (EnKF) has gained recent attention for its use in solving inverse
problems. We review the original method and discuss recent developments in
particular in view of the limit for infinitely particles and extensions towards
stability analysis and multi--objective optimization. We illustrate the
performance of the method by using test inverse problems from the literature
Joint state and parameter estimation to address model error in convective scale numerical weather prediction systems
Numerical weather prediction models need initial conditions to produce weather forecasts. These initial conditions are computed through a process called data assimilation, where previously computed model states are updated using newly obtained observations of the atmosphere. The data assimilation system (KENDA) employed at the German Weather Service for regional forecasts is based on the Ensemble Kalman Filter (EnKF), which was designed under the assumption of a perfect model in a stochastic sense and Gaussian error statistics. As neither of these assumptions is valid for operational convection permitting weather prediction models, improvement can be gained by developing methods and algorithms for which these assumptions can be relaxed.
In this thesis we investigate the feasibility of addressing model error by perturbing and estimating uncertain static model parameters using data assimilation techniques. In particular we use the augmented state approach, where parameters are updated by observations via their correlation with observed state variables. This online approach offers a flexible, yet consistent way to better fit model variables affected by the chosen parameters to observations, while ensuring feasible model states. A key challenge is to design the probability distribution of the parameters, which should reflect the uncertainty of the targeted model error.
We show in an operational setup that the representation of clouds in COSMO-DE is improved if the two dimensional roughness length parameter is estimated with the augmented state approach. Here, the targeted model error is the roughness length itself and the surface fluxes, which influence the initiation of convection. The probability density function of the roughness length, and by extension the model error corresponding to the surface fluxes, is assumed Gaussian with a certain covariance matrix. The results are highly sensitive to the choice of covariance matrix, and strongly suggest the importance of assimilating surface wind measurements.
In addition we evaluate two recently developed modifications of the EnKF that either explicitly incorporate constraints such as mass conservation and positivity of precipitation by solving constrained minimization problems (QPEns), or introduce higher order moments such as skewness (QF) to deal with non-Gaussian error statistics. We show in a idealized setup that the estimation of parameters benefits from the QF (even for moderate ensemble sizes) and that the QPEns generally outperforms the EnkF significantly. To reduce the high computational costs of the QPEns we propose an new algorithm that exploits properties of the minimization problems that need to be solved. We also explore a different approach where we train a neural network to reproduce the initial conditions generated by the QPEns from those generated by the EnKF.
Besides the encouraging finding that even in a near operational setup model error is significantly reduced by estimating appropriate model parameters, we provide various suggestions for further research that can lead to further improvements
A Stabilization of a Continuous Limit of the Ensemble Kalman Filter
The ensemble Kalman filter belongs to the class of iterative particle
filtering methods and can be used for solving control--to--observable inverse
problems. In recent years several continuous limits in the number of iteration
and particles have been performed in order to study properties of the method.
In particular, a one--dimensional linear stability analysis reveals a possible
instability of the solution provided by the continuous--time limit of the
ensemble Kalman filter for inverse problems. In this work we address this issue
by introducing a stabilization of the dynamics which leads to a method with
globally asymptotically stable solutions. We illustrate the performance of the
stabilized version of the ensemble Kalman filter by using test inverse problems
from the literature and comparing it with the classical formulation of the
method
- …