78,034 research outputs found

    Identifying and attacking the saddle point problem in high-dimensional non-convex optimization

    Full text link
    A central challenge to many fields of science and engineering involves minimizing non-convex error functions over continuous, high dimensional spaces. Gradient descent or quasi-Newton methods are almost ubiquitously used to perform such minimizations, and it is often thought that a main source of difficulty for these local methods to find the global minimum is the proliferation of local minima with much higher error than the global minimum. Here we argue, based on results from statistical physics, random matrix theory, neural network theory, and empirical evidence, that a deeper and more profound difficulty originates from the proliferation of saddle points, not local minima, especially in high dimensional problems of practical interest. Such saddle points are surrounded by high error plateaus that can dramatically slow down learning, and give the illusory impression of the existence of a local minimum. Motivated by these arguments, we propose a new approach to second-order optimization, the saddle-free Newton method, that can rapidly escape high dimensional saddle points, unlike gradient descent and quasi-Newton methods. We apply this algorithm to deep or recurrent neural network training, and provide numerical evidence for its superior optimization performance.Comment: The theoretical review and analysis in this article draw heavily from arXiv:1405.4604 [cs.LG

    The Short Path Algorithm Applied to a Toy Model

    Get PDF
    We numerically investigate the performance of the short path optimization algorithm on a toy problem, with the potential chosen to depend only on the total Hamming weight to allow simulation of larger systems. We consider classes of potentials with multiple minima which cause the adiabatic algorithm to experience difficulties with small gaps. The numerical investigation allows us to consider a broader range of parameters than was studied in previous rigorous work on the short path algorithm, and to show that the algorithm can continue to lead to speedups for more general objective functions than those considered before. We find in many cases a polynomial speedup over Grover search. We present a heuristic analytic treatment of choices of these parameters and of scaling of phase transitions in this model.Comment: 11 pages, 9 figures; v2 final version published in Quantu

    Weak in the NEES?: Auto-tuning Kalman Filters with Bayesian Optimization

    Get PDF
    Kalman filters are routinely used for many data fusion applications including navigation, tracking, and simultaneous localization and mapping problems. However, significant time and effort is frequently required to tune various Kalman filter model parameters, e.g. process noise covariance, pre-whitening filter models for non-white noise, etc. Conventional optimization techniques for tuning can get stuck in poor local minima and can be expensive to implement with real sensor data. To address these issues, a new "black box" Bayesian optimization strategy is developed for automatically tuning Kalman filters. In this approach, performance is characterized by one of two stochastic objective functions: normalized estimation error squared (NEES) when ground truth state models are available, or the normalized innovation error squared (NIS) when only sensor data is available. By intelligently sampling the parameter space to both learn and exploit a nonparametric Gaussian process surrogate function for the NEES/NIS costs, Bayesian optimization can efficiently identify multiple local minima and provide uncertainty quantification on its results.Comment: Final version presented at FUSION 2018 Conference, Cambridge, UK, July 2018 (submitted June 1, 2018
    corecore