1,682 research outputs found
Iteration Complexity of Randomized Block-Coordinate Descent Methods for Minimizing a Composite Function
In this paper we develop a randomized block-coordinate descent method for
minimizing the sum of a smooth and a simple nonsmooth block-separable convex
function and prove that it obtains an -accurate solution with
probability at least in at most iterations, where is the number of blocks. For strongly
convex functions the method converges linearly. This extends recent results of
Nesterov [Efficiency of coordinate descent methods on huge-scale optimization
problems, CORE Discussion Paper #2010/2], which cover the smooth case, to
composite minimization, while at the same time improving the complexity by the
factor of 4 and removing from the logarithmic term. More
importantly, in contrast with the aforementioned work in which the author
achieves the results by applying the method to a regularized version of the
objective function with an unknown scaling factor, we show that this is not
necessary, thus achieving true iteration complexity bounds. In the smooth case
we also allow for arbitrary probability vectors and non-Euclidean norms.
Finally, we demonstrate numerically that the algorithm is able to solve
huge-scale -regularized least squares and support vector machine
problems with a billion variables.Comment: 33 pages, 7 figures, 10 table
Parallel Successive Convex Approximation for Nonsmooth Nonconvex Optimization
Consider the problem of minimizing the sum of a smooth (possibly non-convex)
and a convex (possibly nonsmooth) function involving a large number of
variables. A popular approach to solve this problem is the block coordinate
descent (BCD) method whereby at each iteration only one variable block is
updated while the remaining variables are held fixed. With the recent advances
in the developments of the multi-core parallel processing technology, it is
desirable to parallelize the BCD method by allowing multiple blocks to be
updated simultaneously at each iteration of the algorithm. In this work, we
propose an inexact parallel BCD approach where at each iteration, a subset of
the variables is updated in parallel by minimizing convex approximations of the
original objective function. We investigate the convergence of this parallel
BCD method for both randomized and cyclic variable selection rules. We analyze
the asymptotic and non-asymptotic convergence behavior of the algorithm for
both convex and non-convex objective functions. The numerical experiments
suggest that for a special case of Lasso minimization problem, the cyclic block
selection rule can outperform the randomized rule
An Accelerated Proximal Coordinate Gradient Method and its Application to Regularized Empirical Risk Minimization
We consider the problem of minimizing the sum of two convex functions: one is
smooth and given by a gradient oracle, and the other is separable over blocks
of coordinates and has a simple known structure over each block. We develop an
accelerated randomized proximal coordinate gradient (APCG) method for
minimizing such convex composite functions. For strongly convex functions, our
method achieves faster linear convergence rates than existing randomized
proximal coordinate gradient methods. Without strong convexity, our method
enjoys accelerated sublinear convergence rates. We show how to apply the APCG
method to solve the regularized empirical risk minimization (ERM) problem, and
devise efficient implementations that avoid full-dimensional vector operations.
For ill-conditioned ERM problems, our method obtains improved convergence rates
than the state-of-the-art stochastic dual coordinate ascent (SDCA) method
Robust Block Coordinate Descent
In this paper we present a novel randomized block coordinate descent method
for the minimization of a convex composite objective function. The method uses
(approximate) partial second-order (curvature) information, so that the
algorithm performance is more robust when applied to highly nonseparable or ill
conditioned problems. We call the method Robust Coordinate Descent (RCD). At
each iteration of RCD, a block of coordinates is sampled randomly, a quadratic
model is formed about that block and the model is minimized
approximately/inexactly to determine the search direction. An inexpensive line
search is then employed to ensure a monotonic decrease in the objective
function and acceptance of large step sizes. We prove global convergence of the
RCD algorithm, and we also present several results on the local convergence of
RCD for strongly convex functions. Finally, we present numerical results on
large-scale problems to demonstrate the practical performance of the method.Comment: 23 pages, 6 figure
- …