7,766 research outputs found
Combining and scaling descent and negative curvature directions
The original publication is available at www.springerlink.comThe aim of this paper is the study of different approaches to combine and
scale, in an efficient manner, descent information for the solution of unconstrained
optimization problems. We consider the situation in which different directions are
available in a given iteration, and we wish to analyze how to combine these directions
in order to provide a method more efficient and robust than the standard Newton
approach. In particular, we will focus on the scaling process that should be carried
out before combining the directions. We derive some theoretical results regarding
the conditions necessary to ensure the convergence of combination procedures following
schemes similar to our proposals. Finally, we conduct some computational experiments to compare these proposals with a modified Newton’s method and other
procedures in the literature for the combination of information.Catarina P. Avelino was partially supported by Portuguese FCT postdoctoral grant
SFRH/BPD/20453/2004 and by the Research Unit CM-UTAD of University of Trás-os-Montes e Alto
Douro. Javier M. Moguerza and Alberto Olivares were partially supported by Spanish grant MEC
MTM2006-14961-C05-05.
Francisco J. Prieto was partially supported by grant MTM2007-63140 of the Spanish Ministry of
Education.Publicad
Identifying and attacking the saddle point problem in high-dimensional non-convex optimization
A central challenge to many fields of science and engineering involves
minimizing non-convex error functions over continuous, high dimensional spaces.
Gradient descent or quasi-Newton methods are almost ubiquitously used to
perform such minimizations, and it is often thought that a main source of
difficulty for these local methods to find the global minimum is the
proliferation of local minima with much higher error than the global minimum.
Here we argue, based on results from statistical physics, random matrix theory,
neural network theory, and empirical evidence, that a deeper and more profound
difficulty originates from the proliferation of saddle points, not local
minima, especially in high dimensional problems of practical interest. Such
saddle points are surrounded by high error plateaus that can dramatically slow
down learning, and give the illusory impression of the existence of a local
minimum. Motivated by these arguments, we propose a new approach to
second-order optimization, the saddle-free Newton method, that can rapidly
escape high dimensional saddle points, unlike gradient descent and quasi-Newton
methods. We apply this algorithm to deep or recurrent neural network training,
and provide numerical evidence for its superior optimization performance.Comment: The theoretical review and analysis in this article draw heavily from
arXiv:1405.4604 [cs.LG
Nonconvex optimization using negative curvature within a modified linesearch
This paper describes a new algorithm for the solution of nonconvex unconstrained optimization problems, with the
property of converging to points satisfying second-order necessary optimality conditions. The algorithm is based on a procedure
which, from two descent directions, a Newton-type direction and a direction of negative curvature, selects in each
iteration the linesearch model best adapted to the properties of these directions. The paper also presents results of numerical
experiments that illustrate its practical efficiency.Publicad
- …