Search CORE

403 research outputs found

Stochastic Parallel Block Coordinate Descent for Large-scale Saddle Point Problems

Author: Storkey Amos J.
Zhu Zhanxing
Publication venue
Publication date: 23/11/2015
Field of study

We consider convex-concave saddle point problems with a separable structure and non-strongly convex functions. We propose an efficient stochastic block coordinate descent method using adaptive primal-dual updates, which enables flexible parallel optimization for large-scale problems. Our method shares the efficiency and flexibility of block coordinate descent methods with the simplicity of primal-dual methods and utilizing the structure of the separable convex-concave saddle point problem. It is capable of solving a wide range of machine learning applications, including robust principal component analysis, Lasso, and feature selection by group Lasso, etc. Theoretically and empirically, we demonstrate significantly better performance than state-of-the-art methods in all these applications.Comment: Accepted by AAAI 201

arXiv.org e-Print Archive

Edinburgh Research Explorer

Association for the Advancement of Artificial Intelligence: AAAI Publications

Adaptive Stochastic Primal-Dual Coordinate Descent for Separable Saddle Point Problems

Author: A Chambolle
B He
E Esser
P Richtárik
S Shalev-Shwartz
Y Nesterov
Y Ouyang
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2015
Field of study

We consider a generic convex-concave saddle point problem with separable structure, a form that covers a wide-ranged machine learning applications. Under this problem structure, we follow the framework of primal-dual updates for saddle point problems, and incorporate stochastic block coordinate descent with adaptive stepsize into this framework. We theoretically show that our proposal of adaptive stepsize potentially achieves a sharper linear convergence rate compared with the existing methods. Additionally, since we can select "mini-batch" of block coordinates to update, our method is also amenable to parallel processing for large-scale data. We apply the proposed method to regularized empirical risk minimization and show that it performs comparably or, more often, better than state-of-the-art methods on both synthetic and real-world data sets.Comment: Accepted by ECML/PKDD201

arXiv.org e-Print Archive

Crossref

Edinburgh Research Explorer

Block-proximal methods with spatially adapted acceleration

Author: Valkonen Tuomo
Publication venue: 'Osterreichische Akademie der Wissenschaften'
Publication date: 03/01/2019
Field of study

We study and develop (stochastic) primal--dual block-coordinate descent methods for convex problems based on the method due to Chambolle and Pock. Our methods have known convergence rates for the iterates and the ergodic gap:

O(1/N^2)

if each block is strongly convex,

O(1/N)

if no convexity is present, and more generally a mixed rate

O(1/N^2)+O(1/N)

for strongly convex blocks, if only some blocks are strongly convex. Additional novelties of our methods include blockwise-adapted step lengths and acceleration, as well as the ability to update both the primal and dual variables randomly in blocks under a very light compatibility condition. In other words, these variants of our methods are doubly-stochastic. We test the proposed methods on various image processing problems, where we employ pixelwise-adapted acceleration

arXiv.org e-Print Archive

Elektronisches Publikationsportal der Ãsterreichischen Akademie der Wissenschaften

Elektronisches Publikationsportal der Österreichischen Akademie der Wissenschaften

Stochastic Variance Reduction Methods for Saddle-Point Problems

Author: Bach Francis
Balamurugan P
Publication venue
Publication date: 20/05/2016
Field of study

We consider convex-concave saddle-point problems where the objective functions may be split in many components, and extend recent stochastic variance reduction methods (such as SVRG or SAGA) to provide the first large-scale linearly convergent algorithms for this class of problems which is common in machine learning. While the algorithmic extension is straightforward, it comes with challenges and opportunities: (a) the convex minimization analysis does not apply and we use the notion of monotone operators to prove convergence, showing in particular that the same algorithm applies to a larger class of problems, such as variational inequalities, (b) there are two notions of splits, in terms of functions, or in terms of partial derivatives, (c) the split does need to be done with convex-concave terms, (d) non-uniform sampling is key to an efficient algorithm, both in theory and practice, and (e) these incremental algorithms can be easily accelerated using a simple extension of the "catalyst" framework, leading to an algorithm which is always superior to accelerated batch algorithms.Comment: Neural Information Processing Systems (NIPS), 2016, Barcelona, Spai

arXiv.org e-Print Archive

INRIA a CCSD electronic archive server

Iteration Complexity of Randomized Primal-Dual Methods for Convex-Concave Saddle Point Problems

Author: Aybat N. S.
Hamedani E. Yazdandoost
Jalilzadeh A.
Shanbhag U. V.
Publication venue
Publication date: 02/07/2018
Field of study

In this paper we propose a class of randomized primal-dual methods to contend with large-scale saddle point problems defined by a convex-concave function

\mathcal{L}(\mathbf{x},y)\triangleq\sum_{i=1}^m f_i(x_i)+\Phi(\mathbf{x},y)-h(y)

. We analyze the convergence rate of the proposed method under the settings of mere convexity and strong convexity in

\mathbf{x}

-variable. In particular, assuming

\nabla_y\Phi(\cdot,\cdot)

is Lipschitz and

\nabla_\mathbf{x}\Phi(\cdot,y)

is coordinate-wise Lipschitz for any fixed

y

, the ergodic sequence generated by the algorithm achieves the convergence rate of

\mathcal{O}(m/k)

in a suitable error metric where

m

denotes the number of coordinates for the primal variable. Furthermore, assuming that

\mathcal{L}(\cdot,y)

is uniformly strongly convex for any

y

, and that

\Phi(\cdot,y)

is linear in

y

, the scheme displays convergence rate of

\mathcal{O}(m/k^2)

. We implemented the proposed algorithmic framework to solve kernel matrix learning problem, and tested it against other state-of-the-art solvers

arXiv.org e-Print Archive