11 research outputs found
Smooth minimization of nonsmooth functions with parallel coordinate descent methods
We study the performance of a family of randomized parallel coordinate
descent methods for minimizing the sum of a nonsmooth and separable convex
functions. The problem class includes as a special case L1-regularized L1
regression and the minimization of the exponential loss ("AdaBoost problem").
We assume the input data defining the loss function is contained in a sparse
matrix with at most nonzeros in each row. Our methods
need iterations to find an approximate solution with high
probability, where is the number of processors and for the fastest variant. The notation hides
dependence on quantities such as the required accuracy and confidence levels
and the distance of the starting iterate from an optimal point. Since
is a decreasing function of , the method needs fewer
iterations when more processors are used. Certain variants of our algorithms
perform on average only O(\nnz(A)/n) arithmetic operations during a single
iteration per processor and, because decreases when does,
fewer iterations are needed for sparser problems.Comment: 39 pages, 1 algorithm, 3 figures, 2 table
Parallel Successive Convex Approximation for Nonsmooth Nonconvex Optimization
Consider the problem of minimizing the sum of a smooth (possibly non-convex)
and a convex (possibly nonsmooth) function involving a large number of
variables. A popular approach to solve this problem is the block coordinate
descent (BCD) method whereby at each iteration only one variable block is
updated while the remaining variables are held fixed. With the recent advances
in the developments of the multi-core parallel processing technology, it is
desirable to parallelize the BCD method by allowing multiple blocks to be
updated simultaneously at each iteration of the algorithm. In this work, we
propose an inexact parallel BCD approach where at each iteration, a subset of
the variables is updated in parallel by minimizing convex approximations of the
original objective function. We investigate the convergence of this parallel
BCD method for both randomized and cyclic variable selection rules. We analyze
the asymptotic and non-asymptotic convergence behavior of the algorithm for
both convex and non-convex objective functions. The numerical experiments
suggest that for a special case of Lasso minimization problem, the cyclic block
selection rule can outperform the randomized rule
Smooth Primal-Dual Coordinate Descent Algorithms for Nonsmooth Convex Optimization
We propose a new randomized coordinate descent method for a convex
optimization template with broad applications. Our analysis relies on a novel
combination of four ideas applied to the primal-dual gap function: smoothing,
acceleration, homotopy, and coordinate descent with non-uniform sampling. As a
result, our method features the first convergence rate guarantees among the
coordinate descent methods, that are the best-known under a variety of common
structure assumptions on the template. We provide numerical evidence to support
the theoretical results with a comparison to state-of-the-art algorithms.Comment: NIPS 201
Smooth Minimization of Nonsmooth Functions with Parallel Coordinate Descent Methods
39 pages, 1 algorithm, 3 figures, 2 tablesInternational audienceWe study the performance of a family of randomized parallel coordinate descent methods for minimizing the sum of a nonsmooth and separable convex functions. The problem class includes as a special case L1-regularized L1 regression and the minimization of the exponential loss ("AdaBoost problem"). We assume the input data defining the loss function is contained in a sparse matrix with at most nonzeros in each row. Our methods need iterations to find an approximate solution with high probability, where is the number of processors and for the fastest variant. The notation hides dependence on quantities such as the required accuracy and confidence levels and the distance of the starting iterate from an optimal point. Since is a decreasing function of , the method needs fewer iterations when more processors are used. Certain variants of our algorithms perform on average only O(\nnz(A)/n) arithmetic operations during a single iteration per processor and, because decreases when does, fewer iterations are needed for sparser problems