Search CORE

41 research outputs found

On the Complexity of Parallel Coordinate Descent

Author: Martin Takáč
Peter Richtárik
Rachael Tappenden
Richtárik P.
Richtárik P.
Shalev-Shwartz S.
Publication venue: 'Informa UK Limited'
Publication date: 01/01/2018
Field of study

Crossref

Edinburgh Research Explorer

Coordinate Descent with Arbitrary Sampling II: Expected Separable Overapproximation

Author: Horn R.A.
Liu J.
Lu Z.
Luo Z.Q.
Peter Richtárik
Richtárik P.
Shalev-Shwartz S.
Tappenden R.
Zheng Qu
Publication venue: 'Informa UK Limited'
Publication date: 28/05/2015
Field of study

postprin

arXiv.org e-Print Archive

Crossref

Edinburgh Research Explorer

HKU Scholars Hub

Coordinate Descent with Arbitrary Sampling I: Algorithms and Complexity

Author: Liu J.
Lu Z.
Nesterov Y.
Peter Richtárik
Richtárik P.
Shalev-Shwartz S.
Shalev-Shwartz S.
Tappenden R.
Tseng P.
Zheng Qu
Publication venue: 'Informa UK Limited'
Publication date: 15/06/2015
Field of study

postprin

arXiv.org e-Print Archive

Crossref

Edinburgh Research Explorer

HKU Scholars Hub

Parallel Coordinate Descent Methods for Big Data Optimization

Author: A Ruszczynski
A Saha
D Leventhal
I Dhillon
I Necoara
Ion Necoara
Martin Takáč
P Richtárik
Peter Richtárik
S Shalev-Shwartz
T Strohmer
X Jinchao
Y Li
Y Nesterov
Y Nesterov
Yurii Nesterov
Yurii Nesterov
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2015
Field of study

CiteSeerX

Crossref

Springer - Publisher Connector

Edinburgh Research Explorer

Adaptive Stochastic Primal-Dual Coordinate Descent for Separable Saddle Point Problems

Author: A Chambolle
B He
E Esser
P Richtárik
S Shalev-Shwartz
Y Nesterov
Y Ouyang
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2015
Field of study

We consider a generic convex-concave saddle point problem with separable structure, a form that covers a wide-ranged machine learning applications. Under this problem structure, we follow the framework of primal-dual updates for saddle point problems, and incorporate stochastic block coordinate descent with adaptive stepsize into this framework. We theoretically show that our proposal of adaptive stepsize potentially achieves a sharper linear convergence rate compared with the existing methods. Additionally, since we can select "mini-batch" of block coordinates to update, our method is also amenable to parallel processing for large-scale data. We apply the proposed method to regularized empirical risk minimization and show that it performs comparably or, more often, better than state-of-the-art methods on both synthetic and real-world data sets.Comment: Accepted by ECML/PKDD201

arXiv.org e-Print Archive

Crossref

Edinburgh Research Explorer

Distributed optimization with arbitrary local solvers

Author: Bordes A.
Chenxin Ma
Forero P.A.
Jakub Konečný
Martin Jaggi
Martin Takáč
Michael I. Jordan
Nocedal J.
Peter Richtárik
Richtárik P.
Rockafellar R.T.
Shalev-Shwartz S.
Shalev-Shwartz S.
Virginia Smith
Zhang Y.
Publication venue: 'Informa UK Limited'
Publication date: 03/08/2016
Field of study

With the growth of data and necessity for distributed optimization methods, solvers that work well on a single machine must be re-designed to leverage distributed computation. Recent work in this area has been limited by focusing heavily on developing highly specific methods for the distributed environment. These special-purpose methods are often unable to fully leverage the competitive performance of their well-tuned and customized single machine counterparts. Further, they are unable to easily integrate improvements that continue to be made to single machine methods. To this end, we present a framework for distributed optimization that both allows the flexibility of arbitrary solvers to be used on each (single) machine locally, and yet maintains competitive performance against other state-of-the-art special-purpose distributed methods. We give strong primal-dual convergence rate guarantees for our framework that hold for arbitrary local solvers. We demonstrate the impact of local solver selection both theoretically and in an extensive experimental comparison. Finally, we provide thorough implementation details for our framework, highlighting areas for practical performance gains

arXiv.org e-Print Archive

Infoscience - École polytechnique fédérale de Lausanne

Crossref

Edinburgh Research Explorer

Inexact Coordinate Descent: Complexity and Preconditioning

Author: A Cassioli
A Saha
B Recht
D Donoho
D Leventhal
D Needell
EJ Candès
GC Bento
GH Golub
GL Schultz
I Necoara
J Castro
J Gondzio
Jacek Gondzio
M Benzi
M Schmidt
M Takáč
MAT Figueiredo
MR Hestenes
N Simon
O Devolder
P Richtárik
P Richtárik
P Tseng
P Tseng
Peter Richtárik
R Broughton
Rachael Tappenden
S Bonettini
S Gratton
S Shalev-Schwartz
SJ Wright
SJ Wright
T Strohmer
TA Davis
Y Nesterov
Y Nesterov
Y Nesterov
Z Qin
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2016
Field of study

Crossref

Springer - Publisher Connector

Edinburgh Research Explorer

Optimization in High Dimensions via Accelerated, Parallel, and Proximal Coordinate Descent

Author: Ghaoui L. El
Liu J.
Mareček J.
Olivier Fercoq
Peter Richtárik
Platt J. C.
Richtárik P.
Shalev-Shwartz S.
Shalev-Shwartz S.
Yu
Publication venue: 'Society for Industrial & Applied Mathematics (SIAM)'
Publication date: 01/01/2016
Field of study

International audience<p>We propose a new randomized coordinate descent method for minimizing the sum of convex functions each of which depends on a small number of coordinates only. Our method (APPROX) is simultaneously Accelerated, Parallel and PROXimal; this is the first time such a method is proposed. In the special case when the number of processors is equal to the number of coordinates, the method converges at the rate

2\bar{\omega}\bar{L} R^2/(k+1)^2

, where

k

is the iteration counter,

\bar{\omega}

is a data-weighted \emph{average} degree of separability of the loss function,

\bar{L}

is the \emph{average} of Lipschitz constants associated with the coordinates and individual functions in the sum, and

R

is the distance of the initial point from the minimizer. We show that the method can be implemented without the need to perform full-dimensional vector operations, which is the major bottleneck of accelerated coordinate descent, rendering it impractical. The fact that the method depends on the average degree of separability, and not on the maximum degree, can be attributed to the use of new safe large stepsizes, leading to improved expected separable overapproximation (ESO). These are of independent interest and can be utilized in all existing parallel randomized coordinate descent algorithms based on the concept of ESO. In special cases, our method recovers several classical and recent algorithms such as simple and accelerated proximal gradient descent, as well as serial, parallel and distributed versions of randomized block coordinate descent. \new{Due of this flexibility, APPROX had been used successfully by the authors in a graduate class setting as a modern introduction to deterministic and randomized proximal gradient methods. Our bounds match or improve on the best known bounds for each of the methods APPROX specializes to. Our method has applications in a number of areas, including machine learning, submodular optimization, linear and semidefinite programming.</p

Crossref

Edinburgh Research Explorer

Distributed Block Coordinate Descent for Minimizing Partially Separable Functions

Author: A. Saha
B.K. Natarajan
C. Scherrer
D. Ge
D.D. Lewis
D.P. Bertsekas
D.P. Bertsekas
E.Y. Chang
F. Niu
N.K. Alham
O. Fercoq
OpenMP Architecture Review Board
P. Richtárik
P. Tseng
P. Tseng
P. Tseng
S. Shalev-Shwartz
S. Shalev-Shwartz
Y. Nesterov
Publication venue
Publication date: 02/06/2014
Field of study

In this work we propose a distributed randomized block coordinate descent method for minimizing a convex function with a huge number of variables/coordinates. We analyze its complexity under the assumption that the smooth part of the objective function is partially block separable, and show that the degree of separability directly influences the complexity. This extends the results in [Richtarik, Takac: Parallel coordinate descent methods for big data optimization] to a distributed environment. We first show that partially block separable functions admit an expected separable overapproximation (ESO) with respect to a distributed sampling, compute the ESO parameters, and then specialize complexity results from recent literature that hold under the generic ESO assumption. We describe several approaches to distribution and synchronization of the computation across a cluster of multi-core computers and provide promising computational results.Comment: in Recent Developments in Numerical Analysis and Optimization, 201

arXiv.org e-Print Archive

Crossref

Edinburgh Research Explorer

A block coordinate variable metric forward–backward algorithm

Author: A Auslender
Audrey Repetti
BS Mordukhovich
DG Luenberger
DP Bertsekas
E Candès
Emilie Chouzenoux
GH Golub
H Attouch
H Attouch
H Attouch
HH Bauschke
HH Bauschke
I Waldspurger
J Bolte
J Bolte
J Bolte
J Bolte
JA Fessler
JB Hiriart-Urruty
JC Dainty
Jean-Christophe Pesquet
JJ Moreau
JM Ortega
JR Fienup
K Kurdyka
LM Brègman
M Razaviyayn
MJD Powell
MW Jacobson
P Frankel
P Ochs
P Richtárik
P Tseng
PL Combettes
PL Combettes
PL Combettes
PL Combettes
RT Rockafellar
RW Gerchberg
S Mallat
S Sotthivirat
WI Zangwill
WO Saxton
Y Censor
Y Shechtman
Y Xu
ZQ Luo
ZQ Luo
Publication venue: 'Springer Science and Business Media LLC'
Publication date
Field of study

Crossref