5,028 research outputs found
An Accelerated Proximal Coordinate Gradient Method and its Application to Regularized Empirical Risk Minimization
We consider the problem of minimizing the sum of two convex functions: one is
smooth and given by a gradient oracle, and the other is separable over blocks
of coordinates and has a simple known structure over each block. We develop an
accelerated randomized proximal coordinate gradient (APCG) method for
minimizing such convex composite functions. For strongly convex functions, our
method achieves faster linear convergence rates than existing randomized
proximal coordinate gradient methods. Without strong convexity, our method
enjoys accelerated sublinear convergence rates. We show how to apply the APCG
method to solve the regularized empirical risk minimization (ERM) problem, and
devise efficient implementations that avoid full-dimensional vector operations.
For ill-conditioned ERM problems, our method obtains improved convergence rates
than the state-of-the-art stochastic dual coordinate ascent (SDCA) method
Distributed Partitioned Big-Data Optimization via Asynchronous Dual Decomposition
In this paper we consider a novel partitioned framework for distributed
optimization in peer-to-peer networks. In several important applications the
agents of a network have to solve an optimization problem with two key
features: (i) the dimension of the decision variable depends on the network
size, and (ii) cost function and constraints have a sparsity structure related
to the communication graph. For this class of problems a straightforward
application of existing consensus methods would show two inefficiencies: poor
scalability and redundancy of shared information. We propose an asynchronous
distributed algorithm, based on dual decomposition and coordinate methods, to
solve partitioned optimization problems. We show that, by exploiting the
problem structure, the solution can be partitioned among the nodes, so that
each node just stores a local copy of a portion of the decision variable
(rather than a copy of the entire decision vector) and solves a small-scale
local problem
- …