101 research outputs found

    Never look back - A modified EnKF method and its application to the training of neural networks without back propagation

    Get PDF
    In this work, we present a new derivative-free optimization method and investigate its use for training neural networks. Our method is motivated by the Ensemble Kalman Filter (EnKF), which has been used successfully for solving optimization problems that involve large-scale, highly nonlinear dynamical systems. A key benefit of the EnKF method is that it requires only the evaluation of the forward propagation but not its derivatives. Hence, in the context of neural networks, it alleviates the need for back propagation and reduces the memory consumption dramatically. However, the method is not a pure "black-box" global optimization heuristic as it efficiently utilizes the structure of typical learning problems. Promising first results of the EnKF for training deep neural networks have been presented recently by Kovachki and Stuart. We propose an important modification of the EnKF that enables us to prove convergence of our method to the minimizer of a strongly convex function. Our method also bears similarity with implicit filtering and we demonstrate its potential for minimizing highly oscillatory functions using a simple example. Further, we provide numerical examples that demonstrate the potential of our method for training deep neural networks.Comment: 10 pages, 2 figure

    Never look back - A modified EnKF method and its application to the training of neural networks without back propagation

    Get PDF
    In this work, we present a new derivative-free optimization method and investigate its use for training neural networks. Our method is motivated by the Ensemble Kalman Filter (EnKF), which has been used successfully for solving optimization problems that involve large-scale, highly nonlinear dynamical systems. A key benefit of the EnKF method is that it requires only the evaluation of the forward propagation but not its derivatives. Hence, in the context of neural networks, it alleviates the need for back propagation and reduces the memory consumption dramatically. However, the method is not a pure "black-box" global optimization heuristic as it efficiently utilizes the structure of typical learning problems. Promising first results of the EnKF for training deep neural networks have been presented recently by Kovachki and Stuart. We propose an important modification of the EnKF that enables us to prove convergence of our method to the minimizer of a strongly convex function. Our method also bears similarity with implicit filtering and we demonstrate its potential for minimizing highly oscillatory functions using a simple example. Further, we provide numerical examples that demonstrate the potential of our method for training deep neural networks

    Ensemble Kalman filter for neural network based one-shot inversion

    Full text link
    We study the use of novel techniques arising in machine learning for inverse problems. Our approach replaces the complex forward model by a neural network, which is trained simultaneously in a one-shot sense when estimating the unknown parameters from data, i.e. the neural network is trained only for the unknown parameter. By establishing a link to the Bayesian approach to inverse problems, an algorithmic framework is developed which ensures the feasibility of the parameter estimate w.r. to the forward model. We propose an efficient, derivative-free optimization method based on variants of the ensemble Kalman inversion. Numerical experiments show that the ensemble Kalman filter for neural network based one-shot inversion is a promising direction combining optimization and machine learning techniques for inverse problems

    Affine Invariant Ensemble Transform Methods to Improve Predictive Uncertainty in ReLU Networks

    Full text link
    We consider the problem of performing Bayesian inference for logistic regression using appropriate extensions of the ensemble Kalman filter. Two interacting particle systems are proposed that sample from an approximate posterior and prove quantitative convergence rates of these interacting particle systems to their mean-field limit as the number of particles tends to infinity. Furthermore, we apply these techniques and examine their effectiveness as methods of Bayesian approximation for quantifying predictive uncertainty in ReLU networks

    EnKSGD: A Class Of Preconditioned Black Box Optimization And Inversion Algorithms

    Full text link
    In this paper, we introduce the Ensemble Kalman-Stein Gradient Descent (EnKSGD) class of algorithms. The EnKSGD class of algorithms builds on the ensemble Kalman filter (EnKF) line of work, applying techniques from sequential data assimilation to unconstrained optimization and parameter estimation problems. The essential idea is to exploit the EnKF as a black box (i.e. derivative-free, zeroth order) optimization tool if iterated to convergence. In this paper, we return to the foundations of the EnKF as a sequential data assimilation technique, including its continuous-time and mean-field limits, with the goal of developing faster optimization algorithms suited to noisy black box optimization and inverse problems. The resulting EnKSGD class of algorithms can be designed to both maintain the desirable property of affine-invariance, and employ the well-known backtracking line search. Furthermore, EnKSGD algorithms are designed to not necessitate the subspace restriction property and variance collapse property of previous iterated EnKF approaches to optimization, as both these properties can be undesirable in an optimization context. EnKSGD also generalizes beyond the L2L^{2} loss, and is thus applicable to a wider class of problems than the standard EnKF. Numerical experiments with both linear and nonlinear least squares problems, as well as maximum likelihood estimation, demonstrate the faster convergence of EnKSGD relative to alternative EnKF approaches to optimization.Comment: 20 pages, 3 figure

    Recent Trends on Nonlinear Filtering for Inverse Problems

    Get PDF
    Among the class of nonlinear particle filtering methods, the Ensemble Kalman Filter (EnKF) has gained recent attention for its use in solving inverse problems. We review the original method and discuss recent developments in particular in view of the limit for infinitely particles and extensions towards stability analysis and multi--objective optimization. We illustrate the performance of the method by using test inverse problems from the literature

    Joint state and parameter estimation to address model error in convective scale numerical weather prediction systems

    Get PDF
    Numerical weather prediction models need initial conditions to produce weather forecasts. These initial conditions are computed through a process called data assimilation, where previously computed model states are updated using newly obtained observations of the atmosphere. The data assimilation system (KENDA) employed at the German Weather Service for regional forecasts is based on the Ensemble Kalman Filter (EnKF), which was designed under the assumption of a perfect model in a stochastic sense and Gaussian error statistics. As neither of these assumptions is valid for operational convection permitting weather prediction models, improvement can be gained by developing methods and algorithms for which these assumptions can be relaxed. In this thesis we investigate the feasibility of addressing model error by perturbing and estimating uncertain static model parameters using data assimilation techniques. In particular we use the augmented state approach, where parameters are updated by observations via their correlation with observed state variables. This online approach offers a flexible, yet consistent way to better fit model variables affected by the chosen parameters to observations, while ensuring feasible model states. A key challenge is to design the probability distribution of the parameters, which should reflect the uncertainty of the targeted model error. We show in an operational setup that the representation of clouds in COSMO-DE is improved if the two dimensional roughness length parameter is estimated with the augmented state approach. Here, the targeted model error is the roughness length itself and the surface fluxes, which influence the initiation of convection. The probability density function of the roughness length, and by extension the model error corresponding to the surface fluxes, is assumed Gaussian with a certain covariance matrix. The results are highly sensitive to the choice of covariance matrix, and strongly suggest the importance of assimilating surface wind measurements. In addition we evaluate two recently developed modifications of the EnKF that either explicitly incorporate constraints such as mass conservation and positivity of precipitation by solving constrained minimization problems (QPEns), or introduce higher order moments such as skewness (QF) to deal with non-Gaussian error statistics. We show in a idealized setup that the estimation of parameters benefits from the QF (even for moderate ensemble sizes) and that the QPEns generally outperforms the EnkF significantly. To reduce the high computational costs of the QPEns we propose an new algorithm that exploits properties of the minimization problems that need to be solved. We also explore a different approach where we train a neural network to reproduce the initial conditions generated by the QPEns from those generated by the EnKF. Besides the encouraging finding that even in a near operational setup model error is significantly reduced by estimating appropriate model parameters, we provide various suggestions for further research that can lead to further improvements

    A Stabilization of a Continuous Limit of the Ensemble Kalman Filter

    Full text link
    The ensemble Kalman filter belongs to the class of iterative particle filtering methods and can be used for solving control--to--observable inverse problems. In recent years several continuous limits in the number of iteration and particles have been performed in order to study properties of the method. In particular, a one--dimensional linear stability analysis reveals a possible instability of the solution provided by the continuous--time limit of the ensemble Kalman filter for inverse problems. In this work we address this issue by introducing a stabilization of the dynamics which leads to a method with globally asymptotically stable solutions. We illustrate the performance of the stabilized version of the ensemble Kalman filter by using test inverse problems from the literature and comparing it with the classical formulation of the method
    • …
    corecore