619 research outputs found
Let's Make Block Coordinate Descent Go Fast: Faster Greedy Rules, Message-Passing, Active-Set Complexity, and Superlinear Convergence
Block coordinate descent (BCD) methods are widely-used for large-scale
numerical optimization because of their cheap iteration costs, low memory
requirements, amenability to parallelization, and ability to exploit problem
structure. Three main algorithmic choices influence the performance of BCD
methods: the block partitioning strategy, the block selection rule, and the
block update rule. In this paper we explore all three of these building blocks
and propose variations for each that can lead to significantly faster BCD
methods. We (i) propose new greedy block-selection strategies that guarantee
more progress per iteration than the Gauss-Southwell rule; (ii) explore
practical issues like how to implement the new rules when using "variable"
blocks; (iii) explore the use of message-passing to compute matrix or Newton
updates efficiently on huge blocks for problems with a sparse dependency
between variables; and (iv) consider optimal active manifold identification,
which leads to bounds on the "active set complexity" of BCD methods and leads
to superlinear convergence for certain problems with sparse solutions (and in
some cases finite termination at an optimal solution). We support all of our
findings with numerical results for the classic machine learning problems of
least squares, logistic regression, multi-class logistic regression, label
propagation, and L1-regularization
Limited-Memory BFGS with Displacement Aggregation
A displacement aggregation strategy is proposed for the curvature pairs
stored in a limited-memory BFGS method such that the resulting (inverse)
Hessian approximations are equal to those that would be derived from a
full-memory BFGS method. This means that, if a sufficiently large number of
pairs are stored, then an optimization algorithm employing the limited-memory
method can achieve the same theoretical convergence properties as when
full-memory (inverse) Hessian approximations are stored and employed, such as a
local superlinear rate of convergence under assumptions that are common for
attaining such guarantees. To the best of our knowledge, this is the first work
in which a local superlinear convergence rate guarantee is offered by a
quasi-Newton scheme that does not either store all curvature pairs throughout
the entire run of the optimization algorithm or store an explicit (inverse)
Hessian approximation.Comment: 24 pages, 3 figure
On the Local and Global Convergence of a Reduced Quasi-Newton Method
In optimization in R^n with m nonlinear equality constraints, we study the local convergence of reduced quasi-Newton methods, in which the updated matrix is of order n-m. In particular, we give necessary and sufficient conditions for q-superlinear convergence (in one step). We introduce a device to globalize the local algorithm which consists in determining a step on an arc in order to decrease an exact penalty function. We give conditions so that asymptotically the step will be equal to one
- …