1,676 research outputs found
Asynchronous Distributed Semi-Stochastic Gradient Optimization
With the recent proliferation of large-scale learning problems,there have
been a lot of interest on distributed machine learning algorithms, particularly
those that are based on stochastic gradient descent (SGD) and its variants.
However, existing algorithms either suffer from slow convergence due to the
inherent variance of stochastic gradients, or have a fast linear convergence
rate but at the expense of poorer solution quality. In this paper, we combine
their merits by proposing a fast distributed asynchronous SGD-based algorithm
with variance reduction. A constant learning rate can be used, and it is also
guaranteed to converge linearly to the optimal solution. Experiments on the
Google Cloud Computing Platform demonstrate that the proposed algorithm
outperforms state-of-the-art distributed asynchronous algorithms in terms of
both wall clock time and solution quality
Breaking the Nonsmooth Barrier: A Scalable Parallel Method for Composite Optimization
Due to their simplicity and excellent performance, parallel asynchronous
variants of stochastic gradient descent have become popular methods to solve a
wide range of large-scale optimization problems on multi-core architectures.
Yet, despite their practical success, support for nonsmooth objectives is still
lacking, making them unsuitable for many problems of interest in machine
learning, such as the Lasso, group Lasso or empirical risk minimization with
convex constraints.
In this work, we propose and analyze ProxASAGA, a fully asynchronous sparse
method inspired by SAGA, a variance reduced incremental gradient algorithm. The
proposed method is easy to implement and significantly outperforms the state of
the art on several nonsmooth, large-scale problems. We prove that our method
achieves a theoretical linear speedup with respect to the sequential version
under assumptions on the sparsity of gradients and block-separability of the
proximal term. Empirical benchmarks on a multi-core architecture illustrate
practical speedups of up to 12x on a 20-core machine.Comment: Appears in Advances in Neural Information Processing Systems 30 (NIPS
2017), 28 page
- …