68 research outputs found
Gradient Descent and the Power Method: Exploiting their connection to find the leftmost eigen-pair and escape saddle points
This work shows that applying Gradient Descent (GD) with a fixed step size to
minimize a (possibly nonconvex) quadratic function is equivalent to running the
Power Method (PM) on the gradients. The connection between GD with a fixed step
size and the PM, both with and without fixed momentum, is thus established.
Consequently, valuable eigen-information is available via GD.
Recent examples show that GD with a fixed step size, applied to locally
quadratic nonconvex functions, can take exponential time to escape saddle
points (Simon S. Du, Chi Jin, Jason D. Lee, Michael I. Jordan, Aarti Singh, and
Barnabas Poczos: "Gradient descent can take exponential time to escape saddle
points"; S. Paternain, A. Mokhtari, and A. Ribeiro: "A newton-based method for
nonconvex optimization with fast evasion of saddle points"). Here, those
examples are revisited and it is shown that eigenvalue information was missing,
so that the examples may not provide a complete picture of the potential
practical behaviour of GD. Thus, ongoing investigation of the behaviour of GD
on nonconvex functions, possibly with an \emph{adaptive} or \emph{variable}
step size, is warranted.
It is shown that, in the special case of a quadratic in , if an
eigenvalue is known, then GD with a fixed step size will converge in two
iterations, and a complete eigen-decomposition is available.
By considering the dynamics of the gradients and iterates, new step size
strategies are proposed to improve the practical performance of GD. Several
numerical examples are presented, which demonstrate the advantages of
exploiting the GD--PM connection
Optimization Methods for Inverse Problems
Optimization plays an important role in solving many inverse problems.
Indeed, the task of inversion often either involves or is fully cast as a
solution of an optimization problem. In this light, the mere non-linear,
non-convex, and large-scale nature of many of these inversions gives rise to
some very challenging optimization problems. The inverse problem community has
long been developing various techniques for solving such optimization tasks.
However, other, seemingly disjoint communities, such as that of machine
learning, have developed, almost in parallel, interesting alternative methods
which might have stayed under the radar of the inverse problem community. In
this survey, we aim to change that. In doing so, we first discuss current
state-of-the-art optimization methods widely used in inverse problems. We then
survey recent related advances in addressing similar challenges in problems
faced by the machine learning community, and discuss their potential advantages
for solving inverse problems. By highlighting the similarities among the
optimization challenges faced by the inverse problem and the machine learning
communities, we hope that this survey can serve as a bridge in bringing
together these two communities and encourage cross fertilization of ideas.Comment: 13 page
A Generic Approach for Escaping Saddle points
A central challenge to using first-order methods for optimizing nonconvex
problems is the presence of saddle points. First-order methods often get stuck
at saddle points, greatly deteriorating their performance. Typically, to escape
from saddles one has to use second-order methods. However, most works on
second-order methods rely extensively on expensive Hessian-based computations,
making them impractical in large-scale settings. To tackle this challenge, we
introduce a generic framework that minimizes Hessian based computations while
at the same time provably converging to second-order critical points. Our
framework carefully alternates between a first-order and a second-order
subroutine, using the latter only close to saddle points, and yields
convergence results competitive to the state-of-the-art. Empirical results
suggest that our strategy also enjoys a good practical performance
Simba: A Scalable Bilevel Preconditioned Gradient Method for Fast Evasion of Flat Areas and Saddle Points
The convergence behaviour of first-order methods can be severely slowed down
when applied to high-dimensional non-convex functions due to the presence of
saddle points. If, additionally, the saddles are surrounded by large plateaus,
it is highly likely that the first-order methods will converge to sub-optimal
solutions. In machine learning applications, sub-optimal solutions mean poor
generalization performance. They are also related to the issue of
hyper-parameter tuning, since, in the pursuit of solutions that yield lower
errors, a tremendous amount of time is required on selecting the
hyper-parameters appropriately. A natural way to tackle the limitations of
first-order methods is to employ the Hessian information. However, methods that
incorporate the Hessian do not scale or, if they do, they are very slow for
modern applications. Here, we propose Simba, a scalable preconditioned gradient
method, to address the main limitations of the first-order methods. The method
is very simple to implement. It maintains a single precondition matrix that it
is constructed as the outer product of the moving average of the gradients. To
significantly reduce the computational cost of forming and inverting the
preconditioner, we draw links with the multilevel optimization methods. These
links enables us to construct preconditioners in a randomized manner. Our
numerical experiments verify the scalability of Simba as well as its efficacy
near saddles and flat areas. Further, we demonstrate that Simba offers a
satisfactory generalization performance on standard benchmark residual
networks. We also analyze Simba and show its linear convergence rate for
strongly convex functions
- …