Search CORE

7,766 research outputs found

Combining and scaling descent and negative curvature directions

Author: Avelino Catarina P.
Moguerza Javier M.
Olivares Alberto
Prieto Francisco J.
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2011
Field of study

The original publication is available at www.springerlink.comThe aim of this paper is the study of different approaches to combine and scale, in an efficient manner, descent information for the solution of unconstrained optimization problems. We consider the situation in which different directions are available in a given iteration, and we wish to analyze how to combine these directions in order to provide a method more efficient and robust than the standard Newton approach. In particular, we will focus on the scaling process that should be carried out before combining the directions. We derive some theoretical results regarding the conditions necessary to ensure the convergence of combination procedures following schemes similar to our proposals. Finally, we conduct some computational experiments to compare these proposals with a modified Newton’s method and other procedures in the literature for the combination of information.Catarina P. Avelino was partially supported by Portuguese FCT postdoctoral grant SFRH/BPD/20453/2004 and by the Research Unit CM-UTAD of University of Trás-os-Montes e Alto Douro. Javier M. Moguerza and Alberto Olivares were partially supported by Spanish grant MEC MTM2006-14961-C05-05. Francisco J. Prieto was partially supported by grant MTM2007-63140 of the Spanish Ministry of Education.Publicad

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

Universidad Carlos III de Madrid e-Archivo

Identifying and attacking the saddle point problem in high-dimensional non-convex optimization

Author: Caglar Gulcehre
Kyunghyun Cho
Razvan Pascanu
Surya Ganguli
Universite ́ De Montréal
Yann N. Dauphin
Yoshua Bengio
Publication venue
Publication date: 10/06/2014
Field of study

A central challenge to many fields of science and engineering involves minimizing non-convex error functions over continuous, high dimensional spaces. Gradient descent or quasi-Newton methods are almost ubiquitously used to perform such minimizations, and it is often thought that a main source of difficulty for these local methods to find the global minimum is the proliferation of local minima with much higher error than the global minimum. Here we argue, based on results from statistical physics, random matrix theory, neural network theory, and empirical evidence, that a deeper and more profound difficulty originates from the proliferation of saddle points, not local minima, especially in high dimensional problems of practical interest. Such saddle points are surrounded by high error plateaus that can dramatically slow down learning, and give the illusory impression of the existence of a local minimum. Motivated by these arguments, we propose a new approach to second-order optimization, the saddle-free Newton method, that can rapidly escape high dimensional saddle points, unlike gradient descent and quasi-Newton methods. We apply this algorithm to deep or recurrent neural network training, and provide numerical evidence for its superior optimization performance.Comment: The theoretical review and analysis in this article draw heavily from arXiv:1405.4604 [cs.LG

arXiv.org e-Print Archive

CiteSeerX

Nonconvex optimization using negative curvature within a modified linesearch

Author: Moguerza Javier M.
Olivares Alberto
Prieto Francisco J.
Publication venue: 'Elsevier BV'
Publication date: 01/01/2008
Field of study

This paper describes a new algorithm for the solution of nonconvex unconstrained optimization problems, with the property of converging to points satisfying second-order necessary optimality conditions. The algorithm is based on a procedure which, from two descent directions, a Newton-type direction and a direction of negative curvature, selects in each iteration the linesearch model best adapted to the properties of these directions. The paper also presents results of numerical experiments that illustrate its practical efficiency.Publicad

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

Universidad Carlos III de Madrid e-Archivo