78,034 research outputs found
Identifying and attacking the saddle point problem in high-dimensional non-convex optimization
A central challenge to many fields of science and engineering involves
minimizing non-convex error functions over continuous, high dimensional spaces.
Gradient descent or quasi-Newton methods are almost ubiquitously used to
perform such minimizations, and it is often thought that a main source of
difficulty for these local methods to find the global minimum is the
proliferation of local minima with much higher error than the global minimum.
Here we argue, based on results from statistical physics, random matrix theory,
neural network theory, and empirical evidence, that a deeper and more profound
difficulty originates from the proliferation of saddle points, not local
minima, especially in high dimensional problems of practical interest. Such
saddle points are surrounded by high error plateaus that can dramatically slow
down learning, and give the illusory impression of the existence of a local
minimum. Motivated by these arguments, we propose a new approach to
second-order optimization, the saddle-free Newton method, that can rapidly
escape high dimensional saddle points, unlike gradient descent and quasi-Newton
methods. We apply this algorithm to deep or recurrent neural network training,
and provide numerical evidence for its superior optimization performance.Comment: The theoretical review and analysis in this article draw heavily from
arXiv:1405.4604 [cs.LG
The Short Path Algorithm Applied to a Toy Model
We numerically investigate the performance of the short path optimization
algorithm on a toy problem, with the potential chosen to depend only on the
total Hamming weight to allow simulation of larger systems. We consider classes
of potentials with multiple minima which cause the adiabatic algorithm to
experience difficulties with small gaps. The numerical investigation allows us
to consider a broader range of parameters than was studied in previous rigorous
work on the short path algorithm, and to show that the algorithm can continue
to lead to speedups for more general objective functions than those considered
before. We find in many cases a polynomial speedup over Grover search. We
present a heuristic analytic treatment of choices of these parameters and of
scaling of phase transitions in this model.Comment: 11 pages, 9 figures; v2 final version published in Quantu
Weak in the NEES?: Auto-tuning Kalman Filters with Bayesian Optimization
Kalman filters are routinely used for many data fusion applications including
navigation, tracking, and simultaneous localization and mapping problems.
However, significant time and effort is frequently required to tune various
Kalman filter model parameters, e.g. process noise covariance, pre-whitening
filter models for non-white noise, etc. Conventional optimization techniques
for tuning can get stuck in poor local minima and can be expensive to implement
with real sensor data. To address these issues, a new "black box" Bayesian
optimization strategy is developed for automatically tuning Kalman filters. In
this approach, performance is characterized by one of two stochastic objective
functions: normalized estimation error squared (NEES) when ground truth state
models are available, or the normalized innovation error squared (NIS) when
only sensor data is available. By intelligently sampling the parameter space to
both learn and exploit a nonparametric Gaussian process surrogate function for
the NEES/NIS costs, Bayesian optimization can efficiently identify multiple
local minima and provide uncertainty quantification on its results.Comment: Final version presented at FUSION 2018 Conference, Cambridge, UK,
July 2018 (submitted June 1, 2018
- …