480 research outputs found

    On limited-memory quasi-Newton methods for minimizing a quadratic function

    Full text link
    The main focus in this paper is exact linesearch methods for minimizing a quadratic function whose Hessian is positive definite. We give two classes of limited-memory quasi-Newton Hessian approximations that generate search directions parallel to those of the method of preconditioned conjugate gradients, and hence give finite termination on quadratic optimization problems. The Hessian approximations are described by a novel compact representation which provides a dynamical framework. We also discuss possible extensions of these classes and show their behavior on randomly generated quadratic optimization problems. The methods behave numerically similar to L-BFGS. Inclusion of information from the first iteration in the limited-memory Hessian approximation and L-BFGS significantly reduces the effects of round-off errors on the considered problems. In addition, we give our compact representation of the Hessian approximations in the full Broyden class for the general unconstrained optimization problem. This representation consists of explicit matrices and gradients only as vector components

    Regularization of Limited Memory Quasi-Newton Methods for Large-Scale Nonconvex Minimization

    Full text link
    This paper deals with regularized Newton methods, a flexible class of unconstrained optimization algorithms that is competitive with line search and trust region methods and potentially combines attractive elements of both. The particular focus is on combining regularization with limited memory quasi-Newton methods by exploiting the special structure of limited memory algorithms. Global convergence of regularization methods is shown under mild assumptions and the details of regularized limited memory quasi-Newton updates are discussed including their compact representations. Numerical results using all large-scale test problems from the CUTEst collection indicate that our regularized version of L-BFGS is competitive with state-of-the-art line search and trust-region L-BFGS algorithms and previous attempts at combining L-BFGS with regularization, while potentially outperforming some of them, especially when nonmonotonicity is involved.Comment: 23 pages, 4 figure

    Acceleration and new analysis of convex optimization algorithms

    Full text link
    Ces dernières années ont vu une résurgence de l’algorithme de Frank-Wolfe (FW) (également connu sous le nom de méthodes de gradient conditionnel) dans l’optimisation clairsemée et les problèmes d’apprentissage automatique à grande échelle avec des objectifs convexes lisses. Par rapport aux méthodes de gradient projeté ou proximal, une telle méthode sans projection permet d’économiser le coût de calcul des projections orthogonales sur l’ensemble de contraintes. Parallèlement, FW propose également des solutions à structure clairsemée. Malgré ces propriétés prometteuses, FW ne bénéficie pas des taux de convergence optimaux obtenus par les méthodes accélérées basées sur la projection. Nous menons une enquête dé- taillée sur les essais récents pour accélérer FW dans différents contextes et soulignons où se situe la difficulté lorsque l’on vise des taux linéaires globaux en théorie. En outre, nous fournissons une direction prometteuse pour accélérer FW sur des ensembles fortement convexes en utilisant des techniques d’intervalle de dualité et une nouvelle notion de régularité. D’autre part, l’algorithme FW est une covariante affine et bénéficie de taux de convergence accélérés lorsque l’ensemble de contraintes est fortement convexe. Cependant, ces résultats reposent sur des hypothèses dépendantes de la norme, entraînant généralement des bornes invariantes non affines, en contradiction avec la propriété de covariante affine de FW. Dans ce travail, nous introduisons de nouvelles hypothèses structurelles sur le problème (comme la régularité directionnelle) et dérivons une analyse affine invariante et indépendante de la norme de Frank-Wolfe. Sur la base de notre analyse, nous proposons une recherche par ligne affine invariante. Fait intéressant, nous montrons que les recherches en ligne classiques utilisant la régularité de la fonction objectif convergent étonnamment vers une taille de pas invariante affine, malgré l’utilisation de normes dépendantes de l’affine dans le calcul des tailles de pas. Cela indique que nous n’avons pas nécessairement besoin de connaître à l’avance la structure des ensembles pour profiter du taux accéléré affine-invariant. Dans un autre axe de recherche, nous étudions les algorithmes au-delà des méthodes du premier ordre. Les techniques Quasi-Newton approchent le pas de Newton en estimant le Hessien en utilisant les équations dites sécantes. Certaines de ces méthodes calculent le Hessien en utilisant plusieurs équations sécantes mais produisent des mises à jour non symétriques. D’autres schémas quasi-Newton, tels que BFGS, imposent la symétrie mais ne peuvent pas satisfaire plus d’une équation sécante. Nous proposons un nouveau type de mise à jour symétrique quasi-Newton utilisant plusieurs équations sécantes au sens des moindres carrés. Notre approche généralise et unifie la conception de mises à jour quasi-Newton et satisfait des garanties de robustesse prouvables.Recent years have witnessed a resurgence of the Frank-Wolfe (FW) algorithm, also known as conditional gradient methods, in sparse optimization and large-scale machine learning problems with smooth convex objectives. Compared to projected or proximal gradient methods, such projection-free method saves the computational cost of orthogonal projections onto the constraint set. Meanwhile, FW also gives solutions with sparse structure. Despite of these promising properties, FW does not enjoy the optimal convergence rates achieved by projection-based accelerated methods. On the other hand, FW algorithm is affine-covariant, and enjoys accelerated convergence rates when the constraint set is strongly convex. However, these results rely on norm-dependent assumptions, usually incurring non-affine invariant bounds, in contradiction with FW’s affine-covariant property. In this work, we introduce new structural assumptions on the problem (such as the directional smoothness) and derive an affine in- variant, norm-independent analysis of Frank-Wolfe. Based on our analysis, we pro- pose an affine invariant backtracking line-search. Interestingly, we show that typical back-tracking line-search techniques using smoothness of the objective function surprisingly converge to an affine invariant stepsize, despite using affine-dependent norms in the computation of stepsizes. This indicates that we do not necessarily need to know the structure of sets in advance to enjoy the affine-invariant accelerated rate. Additionally, we provide a promising direction to accelerate FW over strongly convex sets using duality gap techniques and a new version of smoothness. In another line of research, we study algorithms beyond first-order methods. Quasi-Newton techniques approximate the Newton step by estimating the Hessian using the so-called secant equations. Some of these methods compute the Hessian using several secant equations but produce non-symmetric updates. Other quasi- Newton schemes, such as BFGS, enforce symmetry but cannot satisfy more than one secant equation. We propose a new type of quasi-Newton symmetric update using several secant equations in a least-squares sense. Our approach generalizes and unifies the design of quasi-Newton updates and satisfies provable robustness guarantees

    Limited-Memory Greedy Quasi-Newton Method with Non-asymptotic Superlinear Convergence Rate

    Full text link
    Non-asymptotic convergence analysis of quasi-Newton methods has gained attention with a landmark result establishing an explicit superlinear rate of O((1/t)t)((1/\sqrt{t})^t). The methods that obtain this rate, however, exhibit a well-known drawback: they require the storage of the previous Hessian approximation matrix or instead storing all past curvature information to form the current Hessian inverse approximation. Limited-memory variants of quasi-Newton methods such as the celebrated L-BFGS alleviate this issue by leveraging a limited window of past curvature information to construct the Hessian inverse approximation. As a result, their per iteration complexity and storage requirement is O(τd)(\tau d) where τ≤d\tau \le d is the size of the window and dd is the problem dimension reducing the O(d2)(d^2) computational cost and memory requirement of standard quasi-Newton methods. However, to the best of our knowledge, there is no result showing a non-asymptotic superlinear convergence rate for any limited-memory quasi-Newton method. In this work, we close this gap by presenting a limited-memory greedy BFGS (LG-BFGS) method that achieves an explicit non-asymptotic superlinear rate. We incorporate displacement aggregation, i.e., decorrelating projection, in post-processing gradient variations, together with a basis vector selection scheme on variable variations, which greedily maximizes a progress measure of the Hessian estimate to the true Hessian. Their combination allows past curvature information to remain in a sparse subspace while yielding a valid representation of the full history. Interestingly, our established non-asymptotic superlinear convergence rate demonstrates a trade-off between the convergence speed and memory requirement, which to our knowledge, is the first of its kind. Numerical results corroborate our theoretical findings and demonstrate the effectiveness of our method

    Shifted limited-memory variable metric methods for large-scale unconstrained optimization

    Get PDF
    AbstractA new family of numerically efficient full-memory variable metric or quasi-Newton methods for unconstrained minimization is given, which give simple possibility to derive related limited-memory methods. Global convergence of the methods can be established for convex sufficiently smooth functions. Numerical experience by comparison with standard methods is encouraging

    Limited-Memory BFGS with Displacement Aggregation

    Get PDF
    A displacement aggregation strategy is proposed for the curvature pairs stored in a limited-memory BFGS method such that the resulting (inverse) Hessian approximations are equal to those that would be derived from a full-memory BFGS method. This means that, if a sufficiently large number of pairs are stored, then an optimization algorithm employing the limited-memory method can achieve the same theoretical convergence properties as when full-memory (inverse) Hessian approximations are stored and employed, such as a local superlinear rate of convergence under assumptions that are common for attaining such guarantees. To the best of our knowledge, this is the first work in which a local superlinear convergence rate guarantee is offered by a quasi-Newton scheme that does not either store all curvature pairs throughout the entire run of the optimization algorithm or store an explicit (inverse) Hessian approximation.Comment: 24 pages, 3 figure

    Shape-Changing Trust-Region Methods Using Multipoint Symmetric Secant Matrices

    Full text link
    In this work, we consider methods for large-scale and nonconvex unconstrained optimization. We propose a new trust-region method whose subproblem is defined using a so-called "shape-changing" norm together with densely-initialized multipoint symmetric secant (MSS) matrices to approximate the Hessian. Shape-changing norms and dense initializations have been successfully used in the context of traditional quasi-Newton methods, but have yet to be explored in the case of MSS methods. Numerical results suggest that trust-region methods that use densely-initialized MSS matrices together with shape-changing norms outperform MSS with other trust-region methods
    • …
    corecore