Search CORE

619 research outputs found

Let's Make Block Coordinate Descent Go Fast: Faster Greedy Rules, Message-Passing, Active-Set Complexity, and Superlinear Convergence

Author: Laradji Issam
Nutini Julie
Schmidt Mark
Publication venue
Publication date: 23/12/2017
Field of study

Block coordinate descent (BCD) methods are widely-used for large-scale numerical optimization because of their cheap iteration costs, low memory requirements, amenability to parallelization, and ability to exploit problem structure. Three main algorithmic choices influence the performance of BCD methods: the block partitioning strategy, the block selection rule, and the block update rule. In this paper we explore all three of these building blocks and propose variations for each that can lead to significantly faster BCD methods. We (i) propose new greedy block-selection strategies that guarantee more progress per iteration than the Gauss-Southwell rule; (ii) explore practical issues like how to implement the new rules when using "variable" blocks; (iii) explore the use of message-passing to compute matrix or Newton updates efficiently on huge blocks for problems with a sparse dependency between variables; and (iv) consider optimal active manifold identification, which leads to bounds on the "active set complexity" of BCD methods and leads to superlinear convergence for certain problems with sparse solutions (and in some cases finite termination at an optimal solution). We support all of our findings with numerical results for the classic machine learning problems of least squares, logistic regression, multi-class logistic regression, label propagation, and L1-regularization

arXiv.org e-Print Archive

Limited-Memory BFGS with Displacement Aggregation

Author: Berahas Albert S.
Curtis Frank E.
Zhou Baoyu
Publication venue
Publication date: 01/04/1985
Field of study

A displacement aggregation strategy is proposed for the curvature pairs stored in a limited-memory BFGS method such that the resulting (inverse) Hessian approximations are equal to those that would be derived from a full-memory BFGS method. This means that, if a sufficiently large number of pairs are stored, then an optimization algorithm employing the limited-memory method can achieve the same theoretical convergence properties as when full-memory (inverse) Hessian approximations are stored and employed, such as a local superlinear rate of convergence under assumptions that are common for attaining such guarantees. To the best of our knowledge, this is the first work in which a local superlinear convergence rate guarantee is offered by a quasi-Newton scheme that does not either store all curvature pairs throughout the entire run of the optimization algorithm or store an explicit (inverse) Hessian approximation.Comment: 24 pages, 3 figure

arXiv.org e-Print Archive

Trinity College

On the Local and Global Convergence of a Reduced Quasi-Newton Method

Author: Gilbert J.C.
Publication venue: WP-87-113
Publication date: 01/10/1987
Field of study

In optimization in R^n with m nonlinear equality constraints, we study the local convergence of reduced quasi-Newton methods, in which the updated matrix is of order n-m. In particular, we give necessary and sufficient conditions for q-superlinear convergence (in one step). We introduce a device to globalize the local algorithm which consists in determining a step on an arc in order to decrease an exact penalty function. We give conditions so that asymptotically the step will be equal to one

International Institute for Applied Systems Analysis (IIASA)