10 research outputs found
An Accelerated Proximal Coordinate Gradient Method and its Application to Regularized Empirical Risk Minimization
We consider the problem of minimizing the sum of two convex functions: one is
smooth and given by a gradient oracle, and the other is separable over blocks
of coordinates and has a simple known structure over each block. We develop an
accelerated randomized proximal coordinate gradient (APCG) method for
minimizing such convex composite functions. For strongly convex functions, our
method achieves faster linear convergence rates than existing randomized
proximal coordinate gradient methods. Without strong convexity, our method
enjoys accelerated sublinear convergence rates. We show how to apply the APCG
method to solve the regularized empirical risk minimization (ERM) problem, and
devise efficient implementations that avoid full-dimensional vector operations.
For ill-conditioned ERM problems, our method obtains improved convergence rates
than the state-of-the-art stochastic dual coordinate ascent (SDCA) method
Hybrid Random/Deterministic Parallel Algorithms for Nonconvex Big Data Optimization
We propose a decomposition framework for the parallel optimization of the sum
of a differentiable {(possibly nonconvex)} function and a nonsmooth (possibly
nonseparable), convex one. The latter term is usually employed to enforce
structure in the solution, typically sparsity. The main contribution of this
work is a novel \emph{parallel, hybrid random/deterministic} decomposition
scheme wherein, at each iteration, a subset of (block) variables is updated at
the same time by minimizing local convex approximations of the original
nonconvex function. To tackle with huge-scale problems, the (block) variables
to be updated are chosen according to a \emph{mixed random and deterministic}
procedure, which captures the advantages of both pure deterministic and random
update-based schemes. Almost sure convergence of the proposed scheme is
established. Numerical results show that on huge-scale problems the proposed
hybrid random/deterministic algorithm outperforms both random and deterministic
schemes.Comment: The order of the authors is alphabetica
A generic coordinate descent solver for nonsmooth convex optimization
International audienceWe present a generic coordinate descent solver for the minimization of a nonsmooth convex objective with structure. The method can deal in particular with problems with linear constraints. The implementation makes use of efficient residual updates and automatically determines which dual variables should be duplicated. A list of basic functional atoms is pre-compiled for efficiency and a modelling language in Python allows the user to combine them at run time. So, the algorithm can be used to solve a large variety of problems including Lasso, sparse multinomial logistic regression, linear and quadratic programs
Block stochastic gradient iteration for convex and nonconvex optimization
The stochastic gradient (SG) method can minimize an objective function
composed of a large number of differentiable functions, or solve a stochastic
optimization problem, to a moderate accuracy. The block coordinate
descent/update (BCD) method, on the other hand, handles problems with multiple
blocks of variables by updating them one at a time; when the blocks of
variables are easier to update individually than together, BCD has a lower
per-iteration cost. This paper introduces a method that combines the features
of SG and BCD for problems with many components in the objective and with
multiple (blocks of) variables.
Specifically, a block stochastic gradient (BSG) method is proposed for
solving both convex and nonconvex programs. At each iteration, BSG approximates
the gradient of the differentiable part of the objective by randomly sampling a
small set of data or sampling a few functions from the sum term in the
objective, and then, using those samples, it updates all the blocks of
variables in either a deterministic or a randomly shuffled order. Its
convergence for both convex and nonconvex cases are established in different
senses. In the convex case, the proposed method has the same order of
convergence rate as the SG method. In the nonconvex case, its convergence is
established in terms of the expected violation of a first-order optimality
condition. The proposed method was numerically tested on problems including
stochastic least squares and logistic regression, which are convex, as well as
low-rank tensor recovery and bilinear logistic regression, which are nonconvex
Parallel Selective Algorithms for Big Data Optimization
We propose a decomposition framework for the parallel optimization of the sum
of a differentiable (possibly nonconvex) function and a (block) separable
nonsmooth, convex one. The latter term is usually employed to enforce structure
in the solution, typically sparsity. Our framework is very flexible and
includes both fully parallel Jacobi schemes and Gauss- Seidel (i.e.,
sequential) ones, as well as virtually all possibilities "in between" with only
a subset of variables updated at each iteration. Our theoretical convergence
results improve on existing ones, and numerical results on LASSO, logistic
regression, and some nonconvex quadratic problems show that the new method
consistently outperforms existing algorithms.Comment: This work is an extended version of the conference paper that has
been presented at IEEE ICASSP'14. The first and the second author contributed
equally to the paper. This revised version contains new numerical results on
non convex quadratic problem
A Unified Algorithmic Framework for Block-Structured Optimization Involving Big Data: With applications in machine learning and signal processing
This article presents a powerful algorithmic framework for big data optimization, called the block successive upper-bound minimization (BSUM). The BSUM includes as special cases many well-known methods for analyzing massive data sets, such as the block coordinate descent (BCD) method, the convex-concave procedure (CCCP) method, the block coordinate proximal gradient (BCPG) method, the nonnegative matrix factorization (NMF) method, the expectation maximization (EM) method, etc. In this article, various features and properties of the BSUM are discussed from the viewpoint of design flexibility, computational efficiency, parallel/distributed implementation, and the required communication overhead. Illustrative examples from networking, signal processing, and machine learning are presented to demonstrate the practical performance of the BSUM framework