6,069 research outputs found
On the Finite Time Convergence of Cyclic Coordinate Descent Methods
Cyclic coordinate descent is a classic optimization method that has witnessed
a resurgence of interest in machine learning. Reasons for this include its
simplicity, speed and stability, as well as its competitive performance on
regularized smooth optimization problems. Surprisingly, very little is
known about its finite time convergence behavior on these problems. Most
existing results either just prove convergence or provide asymptotic rates. We
fill this gap in the literature by proving convergence rates (where
is the iteration counter) for two variants of cyclic coordinate descent
under an isotonicity assumption. Our analysis proceeds by comparing the
objective values attained by the two variants with each other, as well as with
the gradient descent algorithm. We show that the iterates generated by the
cyclic coordinate descent methods remain better than those of gradient descent
uniformly over time.Comment: 20 page
Let's Make Block Coordinate Descent Go Fast: Faster Greedy Rules, Message-Passing, Active-Set Complexity, and Superlinear Convergence
Block coordinate descent (BCD) methods are widely-used for large-scale
numerical optimization because of their cheap iteration costs, low memory
requirements, amenability to parallelization, and ability to exploit problem
structure. Three main algorithmic choices influence the performance of BCD
methods: the block partitioning strategy, the block selection rule, and the
block update rule. In this paper we explore all three of these building blocks
and propose variations for each that can lead to significantly faster BCD
methods. We (i) propose new greedy block-selection strategies that guarantee
more progress per iteration than the Gauss-Southwell rule; (ii) explore
practical issues like how to implement the new rules when using "variable"
blocks; (iii) explore the use of message-passing to compute matrix or Newton
updates efficiently on huge blocks for problems with a sparse dependency
between variables; and (iv) consider optimal active manifold identification,
which leads to bounds on the "active set complexity" of BCD methods and leads
to superlinear convergence for certain problems with sparse solutions (and in
some cases finite termination at an optimal solution). We support all of our
findings with numerical results for the classic machine learning problems of
least squares, logistic regression, multi-class logistic regression, label
propagation, and L1-regularization
An almost cyclic 2-coordinate descent method for singly linearly constrained problems
A block decomposition method is proposed for minimizing a (possibly
non-convex) continuously differentiable function subject to one linear equality
constraint and simple bounds on the variables. The proposed method iteratively
selects a pair of coordinates according to an almost cyclic strategy that does
not use first-order information, allowing us not to compute the whole gradient
of the objective function during the algorithm. Using first-order search
directions to update each pair of coordinates, global convergence to stationary
points is established for different choices of the stepsize under an
appropriate assumption on the level set. In particular, both inexact and exact
line search strategies are analyzed. Further, linear convergence rate is proved
under standard additional assumptions. Numerical results are finally provided
to show the effectiveness of the proposed method.Comment: Computational Optimization and Application
Fixed-point and coordinate descent algorithms for regularized kernel methods
In this paper, we study two general classes of optimization algorithms for
kernel methods with convex loss function and quadratic norm regularization, and
analyze their convergence. The first approach, based on fixed-point iterations,
is simple to implement and analyze, and can be easily parallelized. The second,
based on coordinate descent, exploits the structure of additively separable
loss functions to compute solutions of line searches in closed form. Instances
of these general classes of algorithms are already incorporated into state of
the art machine learning software for large scale problems. We start from a
solution characterization of the regularized problem, obtained using
sub-differential calculus and resolvents of monotone operators, that holds for
general convex loss functions regardless of differentiability. The two
methodologies described in the paper can be regarded as instances of non-linear
Jacobi and Gauss-Seidel algorithms, and are both well-suited to solve large
scale problems
Inexact Block Coordinate Descent Algorithms for Nonsmooth Nonconvex Optimization
In this paper, we propose an inexact block coordinate descent algorithm for
large-scale nonsmooth nonconvex optimization problems. At each iteration, a
particular block variable is selected and updated by inexactly solving the
original optimization problem with respect to that block variable. More
precisely, a local approximation of the original optimization problem is
solved. The proposed algorithm has several attractive features, namely, i) high
flexibility, as the approximation function only needs to be strictly convex and
it does not have to be a global upper bound of the original function; ii) fast
convergence, as the approximation function can be designed to exploit the
problem structure at hand and the stepsize is calculated by the line search;
iii) low complexity, as the approximation subproblems are much easier to solve
and the line search scheme is carried out over a properly constructed
differentiable function; iv) guaranteed convergence of a subsequence to a
stationary point, even when the objective function does not have a Lipschitz
continuous gradient. Interestingly, when the approximation subproblem is solved
by a descent algorithm, convergence of a subsequence to a stationary point is
still guaranteed even if the approximation subproblem is solved inexactly by
terminating the descent algorithm after a finite number of iterations. These
features make the proposed algorithm suitable for large-scale problems where
the dimension exceeds the memory and/or the processing capability of the
existing hardware. These features are also illustrated by several applications
in signal processing and machine learning, for instance, network anomaly
detection and phase retrieval
A Distributed Asynchronous Method of Multipliers for Constrained Nonconvex Optimization
This paper presents a fully asynchronous and distributed approach for
tackling optimization problems in which both the objective function and the
constraints may be nonconvex. In the considered network setting each node is
active upon triggering of a local timer and has access only to a portion of the
objective function and to a subset of the constraints. In the proposed
technique, based on the method of multipliers, each node performs, when it
wakes up, either a descent step on a local augmented Lagrangian or an ascent
step on the local multiplier vector. Nodes realize when to switch from the
descent step to the ascent one through an asynchronous distributed logic-AND,
which detects when all the nodes have reached a predefined tolerance in the
minimization of the augmented Lagrangian. It is shown that the resulting
distributed algorithm is equivalent to a block coordinate descent for the
minimization of the global augmented Lagrangian. This allows one to extend the
properties of the centralized method of multipliers to the considered
distributed framework. Two application examples are presented to validate the
proposed approach: a distributed source localization problem and the parameter
estimation of a neural network.Comment: arXiv admin note: substantial text overlap with arXiv:1803.0648
- …