480 research outputs found
On limited-memory quasi-Newton methods for minimizing a quadratic function
The main focus in this paper is exact linesearch methods for minimizing a
quadratic function whose Hessian is positive definite. We give two classes of
limited-memory quasi-Newton Hessian approximations that generate search
directions parallel to those of the method of preconditioned conjugate
gradients, and hence give finite termination on quadratic optimization
problems. The Hessian approximations are described by a novel compact
representation which provides a dynamical framework. We also discuss possible
extensions of these classes and show their behavior on randomly generated
quadratic optimization problems. The methods behave numerically similar to
L-BFGS. Inclusion of information from the first iteration in the limited-memory
Hessian approximation and L-BFGS significantly reduces the effects of round-off
errors on the considered problems. In addition, we give our compact
representation of the Hessian approximations in the full Broyden class for the
general unconstrained optimization problem. This representation consists of
explicit matrices and gradients only as vector components
Regularization of Limited Memory Quasi-Newton Methods for Large-Scale Nonconvex Minimization
This paper deals with regularized Newton methods, a flexible class of
unconstrained optimization algorithms that is competitive with line search and
trust region methods and potentially combines attractive elements of both. The
particular focus is on combining regularization with limited memory
quasi-Newton methods by exploiting the special structure of limited memory
algorithms. Global convergence of regularization methods is shown under mild
assumptions and the details of regularized limited memory quasi-Newton updates
are discussed including their compact representations.
Numerical results using all large-scale test problems from the CUTEst
collection indicate that our regularized version of L-BFGS is competitive with
state-of-the-art line search and trust-region L-BFGS algorithms and previous
attempts at combining L-BFGS with regularization, while potentially
outperforming some of them, especially when nonmonotonicity is involved.Comment: 23 pages, 4 figure
Acceleration and new analysis of convex optimization algorithms
Ces dernières années ont vu une résurgence de l’algorithme de Frank-Wolfe (FW) (également connu sous le nom de méthodes de gradient conditionnel) dans l’optimisation clairsemée et les problèmes d’apprentissage automatique à grande échelle avec des objectifs convexes lisses. Par rapport aux méthodes de gradient projeté ou proximal, une telle méthode sans projection permet d’économiser le coût de calcul des projections orthogonales sur l’ensemble de contraintes. Parallèlement, FW propose également des solutions à structure clairsemée. Malgré ces propriétés prometteuses, FW ne bénéficie pas des taux de convergence optimaux obtenus par les méthodes accélérées basées sur la projection. Nous menons une enquête dé- taillée sur les essais récents pour accélérer FW dans différents contextes et soulignons où se situe la difficulté lorsque l’on vise des taux linéaires globaux en théorie. En outre, nous fournissons une direction prometteuse pour accélérer FW sur des ensembles fortement convexes en utilisant des techniques d’intervalle de dualité et une nouvelle notion de régularité.
D’autre part, l’algorithme FW est une covariante affine et bénéficie de taux de convergence accélérés lorsque l’ensemble de contraintes est fortement convexe. Cependant, ces résultats reposent sur des hypothèses dépendantes de la norme, entraînant généralement des bornes invariantes non affines, en contradiction avec la propriété de covariante affine de FW. Dans ce travail, nous introduisons de nouvelles hypothèses structurelles sur le problème (comme la régularité directionnelle) et dérivons une analyse affine invariante et indépendante de la norme de Frank-Wolfe. Sur la base de notre analyse, nous proposons une recherche par ligne affine invariante. Fait intéressant, nous montrons que les recherches en ligne classiques utilisant la régularité de la fonction objectif convergent étonnamment vers une taille de pas invariante affine, malgré l’utilisation de normes dépendantes de l’affine dans le calcul des tailles de pas. Cela indique que nous n’avons pas nécessairement besoin de connaître à l’avance la structure des ensembles pour profiter du taux accéléré affine-invariant.
Dans un autre axe de recherche, nous étudions les algorithmes au-delà des méthodes du premier ordre. Les techniques Quasi-Newton approchent le pas de Newton en estimant le Hessien en utilisant les équations dites sécantes. Certaines de ces méthodes calculent le Hessien en utilisant plusieurs équations sécantes mais produisent des mises à jour non symétriques. D’autres schémas quasi-Newton, tels que BFGS, imposent la symétrie mais ne peuvent pas satisfaire plus d’une équation sécante. Nous proposons un nouveau type de mise à jour symétrique quasi-Newton utilisant plusieurs équations sécantes au sens des moindres carrés. Notre approche généralise et unifie la conception de mises à jour quasi-Newton et satisfait des garanties de robustesse prouvables.Recent years have witnessed a resurgence of the Frank-Wolfe (FW) algorithm, also known as conditional gradient methods, in sparse optimization and large-scale machine learning problems with smooth convex objectives. Compared to projected or proximal gradient methods, such projection-free method saves the computational cost of orthogonal projections onto the constraint set. Meanwhile, FW also gives solutions with sparse structure. Despite of these promising properties, FW does not enjoy the optimal convergence rates achieved by projection-based accelerated methods.
On the other hand, FW algorithm is affine-covariant, and enjoys accelerated convergence rates when the constraint set is strongly convex. However, these results rely on norm-dependent assumptions, usually incurring non-affine invariant bounds, in contradiction with FW’s affine-covariant property. In this work, we introduce new structural assumptions on the problem (such as the directional smoothness) and derive an affine in- variant, norm-independent analysis of Frank-Wolfe. Based on our analysis, we pro- pose an affine invariant backtracking line-search. Interestingly, we show that typical back-tracking line-search techniques using smoothness of the objective function surprisingly converge to an affine invariant stepsize, despite using affine-dependent norms in the computation of stepsizes. This indicates that we do not necessarily need to know the structure of sets in advance to enjoy the affine-invariant accelerated rate. Additionally, we provide a promising direction to accelerate FW over strongly convex sets using duality gap techniques and a new version of smoothness.
In another line of research, we study algorithms beyond first-order methods. Quasi-Newton techniques approximate the Newton step by estimating the Hessian using the so-called secant equations. Some of these methods compute the Hessian using several secant equations but produce non-symmetric updates. Other quasi- Newton schemes, such as BFGS, enforce symmetry but cannot satisfy more than one secant equation. We propose a new type of quasi-Newton symmetric update using several secant equations in a least-squares sense. Our approach generalizes and unifies the design of quasi-Newton updates and satisfies provable robustness guarantees
Limited-Memory Greedy Quasi-Newton Method with Non-asymptotic Superlinear Convergence Rate
Non-asymptotic convergence analysis of quasi-Newton methods has gained
attention with a landmark result establishing an explicit superlinear rate of
O. The methods that obtain this rate, however, exhibit a
well-known drawback: they require the storage of the previous Hessian
approximation matrix or instead storing all past curvature information to form
the current Hessian inverse approximation. Limited-memory variants of
quasi-Newton methods such as the celebrated L-BFGS alleviate this issue by
leveraging a limited window of past curvature information to construct the
Hessian inverse approximation. As a result, their per iteration complexity and
storage requirement is O where is the size of the window
and is the problem dimension reducing the O computational cost and
memory requirement of standard quasi-Newton methods. However, to the best of
our knowledge, there is no result showing a non-asymptotic superlinear
convergence rate for any limited-memory quasi-Newton method. In this work, we
close this gap by presenting a limited-memory greedy BFGS (LG-BFGS) method that
achieves an explicit non-asymptotic superlinear rate. We incorporate
displacement aggregation, i.e., decorrelating projection, in post-processing
gradient variations, together with a basis vector selection scheme on variable
variations, which greedily maximizes a progress measure of the Hessian estimate
to the true Hessian. Their combination allows past curvature information to
remain in a sparse subspace while yielding a valid representation of the full
history. Interestingly, our established non-asymptotic superlinear convergence
rate demonstrates a trade-off between the convergence speed and memory
requirement, which to our knowledge, is the first of its kind. Numerical
results corroborate our theoretical findings and demonstrate the effectiveness
of our method
Shifted limited-memory variable metric methods for large-scale unconstrained optimization
AbstractA new family of numerically efficient full-memory variable metric or quasi-Newton methods for unconstrained minimization is given, which give simple possibility to derive related limited-memory methods. Global convergence of the methods can be established for convex sufficiently smooth functions. Numerical experience by comparison with standard methods is encouraging
Limited-Memory BFGS with Displacement Aggregation
A displacement aggregation strategy is proposed for the curvature pairs
stored in a limited-memory BFGS method such that the resulting (inverse)
Hessian approximations are equal to those that would be derived from a
full-memory BFGS method. This means that, if a sufficiently large number of
pairs are stored, then an optimization algorithm employing the limited-memory
method can achieve the same theoretical convergence properties as when
full-memory (inverse) Hessian approximations are stored and employed, such as a
local superlinear rate of convergence under assumptions that are common for
attaining such guarantees. To the best of our knowledge, this is the first work
in which a local superlinear convergence rate guarantee is offered by a
quasi-Newton scheme that does not either store all curvature pairs throughout
the entire run of the optimization algorithm or store an explicit (inverse)
Hessian approximation.Comment: 24 pages, 3 figure
Shape-Changing Trust-Region Methods Using Multipoint Symmetric Secant Matrices
In this work, we consider methods for large-scale and nonconvex unconstrained
optimization. We propose a new trust-region method whose subproblem is defined
using a so-called "shape-changing" norm together with densely-initialized
multipoint symmetric secant (MSS) matrices to approximate the Hessian.
Shape-changing norms and dense initializations have been successfully used in
the context of traditional quasi-Newton methods, but have yet to be explored in
the case of MSS methods. Numerical results suggest that trust-region methods
that use densely-initialized MSS matrices together with shape-changing norms
outperform MSS with other trust-region methods
- …