41 research outputs found
Adaptive Stochastic Primal-Dual Coordinate Descent for Separable Saddle Point Problems
We consider a generic convex-concave saddle point problem with separable
structure, a form that covers a wide-ranged machine learning applications.
Under this problem structure, we follow the framework of primal-dual updates
for saddle point problems, and incorporate stochastic block coordinate descent
with adaptive stepsize into this framework. We theoretically show that our
proposal of adaptive stepsize potentially achieves a sharper linear convergence
rate compared with the existing methods. Additionally, since we can select
"mini-batch" of block coordinates to update, our method is also amenable to
parallel processing for large-scale data. We apply the proposed method to
regularized empirical risk minimization and show that it performs comparably
or, more often, better than state-of-the-art methods on both synthetic and
real-world data sets.Comment: Accepted by ECML/PKDD201
Distributed optimization with arbitrary local solvers
With the growth of data and necessity for distributed optimization methods,
solvers that work well on a single machine must be re-designed to leverage
distributed computation. Recent work in this area has been limited by focusing
heavily on developing highly specific methods for the distributed environment.
These special-purpose methods are often unable to fully leverage the
competitive performance of their well-tuned and customized single machine
counterparts. Further, they are unable to easily integrate improvements that
continue to be made to single machine methods. To this end, we present a
framework for distributed optimization that both allows the flexibility of
arbitrary solvers to be used on each (single) machine locally, and yet
maintains competitive performance against other state-of-the-art
special-purpose distributed methods. We give strong primal-dual convergence
rate guarantees for our framework that hold for arbitrary local solvers. We
demonstrate the impact of local solver selection both theoretically and in an
extensive experimental comparison. Finally, we provide thorough implementation
details for our framework, highlighting areas for practical performance gains
Optimization in High Dimensions via Accelerated, Parallel, and Proximal Coordinate Descent
International audience<p>We propose a new randomized coordinate descent method for minimizing the sum of convex functions each of which depends on a small number of coordinates only. Our method (APPROX) is simultaneously Accelerated, Parallel and PROXimal; this is the first time such a method is proposed. In the special case when the number of processors is equal to the number of coordinates, the method converges at the rate , where is the iteration counter, is a data-weighted \emph{average} degree of separability of the loss function, is the \emph{average} of Lipschitz constants associated with the coordinates and individual functions in the sum, and is the distance of the initial point from the minimizer. We show that the method can be implemented without the need to perform full-dimensional vector operations, which is the major bottleneck of accelerated coordinate descent, rendering it impractical. The fact that the method depends on the average degree of separability, and not on the maximum degree, can be attributed to the use of new safe large stepsizes, leading to improved expected separable overapproximation (ESO). These are of independent interest and can be utilized in all existing parallel randomized coordinate descent algorithms based on the concept of ESO. In special cases, our method recovers several classical and recent algorithms such as simple and accelerated proximal gradient descent, as well as serial, parallel and distributed versions of randomized block coordinate descent. \new{Due of this flexibility, APPROX had been used successfully by the authors in a graduate class setting as a modern introduction to deterministic and randomized proximal gradient methods. Our bounds match or improve on the best known bounds for each of the methods APPROX specializes to. Our method has applications in a number of areas, including machine learning, submodular optimization, linear and semidefinite programming.</p
Distributed Block Coordinate Descent for Minimizing Partially Separable Functions
In this work we propose a distributed randomized block coordinate descent
method for minimizing a convex function with a huge number of
variables/coordinates. We analyze its complexity under the assumption that the
smooth part of the objective function is partially block separable, and show
that the degree of separability directly influences the complexity. This
extends the results in [Richtarik, Takac: Parallel coordinate descent methods
for big data optimization] to a distributed environment. We first show that
partially block separable functions admit an expected separable
overapproximation (ESO) with respect to a distributed sampling, compute the ESO
parameters, and then specialize complexity results from recent literature that
hold under the generic ESO assumption. We describe several approaches to
distribution and synchronization of the computation across a cluster of
multi-core computers and provide promising computational results.Comment: in Recent Developments in Numerical Analysis and Optimization, 201