Accelerated coordinate descent is widely used in optimization due to its
cheap per-iteration cost and scalability to large-scale problems. Up to a
primal-dual transformation, it is also the same as accelerated stochastic
gradient descent that is one of the central methods used in machine learning.
  In this paper, we improve the best known running time of accelerated
coordinate descent by a factor up to $\sqrt{n}$. Our improvement is based on a
clean, novel non-uniform sampling that selects each coordinate with a
probability proportional to the square root of its smoothness parameter. Our
proof technique also deviates from the classical estimation sequence technique
used in prior work. Our speed-up applies to important problems such as
empirical risk minimization and solving linear systems, both in theory and in
practice.Comment: same result, but polished writin

Allen-Zhu, Zeyuan

Qu, Zheng

Richtárik, Peter

Yuan, Yang

Journal of Machine Learning Research

arXiv

The full version of this paper can be found on http://arxiv.org/abs/1512.09103This journal vol. entitled: Proceedings of the 33 rd International Conference on Machine Learning, ICML 2016Accelerated coordinate descent is widely used in optimization due to its cheap per-iteration cost and scalability to large-scale problems. Up to a primal-dual transformation, it is also the same as accelerated stochastic gradient descent that is one of the central methods used in machine learning. In this paper, we improve the best known running time of accelerated coordinate descent by a factor up to square root of n. Our improvement is based on a clean, novel non-uniform sampling that selects each coordinate with a probability proportional to the square root of its smoothness parameter. Our proof technique also deviates from the classical estimation sequence technique used in prior work. Our speed-up applies to important problems such as empirical risk minimization and solving linear systems, both in theory and in practice.published_or_final_versio

Yuan, Y

Richtarik, P

Qu, Z

Allen-Zhu, Z

HKU Scholars Hub

Even faster accelerated coordinate descent using non-uniform sampling

Accelerated coordinate descent is widely used in optimization due to its cheap per-iteration cost and scalability to large-scale problems. Up to a primal-dual transformation, it is also the same as accelerated stochastic gradient descent that is one of the central methods used in machine learning. In this paper, we improve the best known running time of accelerated coordinate descent by a factor up to $\sqrt{n}$. Our improvement is based on a clean, novel non-uniform sampling that selects each coordinate with a probability proportional to the square root of its smoothness parameter. Our proof technique also deviates from the classical estimation sequence technique used in prior work. Our speed-up applies to important problems such as empirical risk minimization and solving linear systems, both in theory and in practice

Edinburgh Research Explorer

Even Faster Accelerated Coordinate Descent Using Non-Uniform Sampling

https://www.pure.ed.ac.uk/ws/files/59043957/1512.09103v3.pdf

Even Faster Accelerated Coordinate Descent Using Non-Uniform Sampling

Abstract

Similar works

Full text

Available Versions

HKU Scholars Hub

Edinburgh Research Explorer