13,788 research outputs found
Breaking the Nonsmooth Barrier: A Scalable Parallel Method for Composite Optimization
Due to their simplicity and excellent performance, parallel asynchronous
variants of stochastic gradient descent have become popular methods to solve a
wide range of large-scale optimization problems on multi-core architectures.
Yet, despite their practical success, support for nonsmooth objectives is still
lacking, making them unsuitable for many problems of interest in machine
learning, such as the Lasso, group Lasso or empirical risk minimization with
convex constraints.
In this work, we propose and analyze ProxASAGA, a fully asynchronous sparse
method inspired by SAGA, a variance reduced incremental gradient algorithm. The
proposed method is easy to implement and significantly outperforms the state of
the art on several nonsmooth, large-scale problems. We prove that our method
achieves a theoretical linear speedup with respect to the sequential version
under assumptions on the sparsity of gradients and block-separability of the
proximal term. Empirical benchmarks on a multi-core architecture illustrate
practical speedups of up to 12x on a 20-core machine.Comment: Appears in Advances in Neural Information Processing Systems 30 (NIPS
2017), 28 page
Fast Nonsmooth Regularized Risk Minimization with Continuation
In regularized risk minimization, the associated optimization problem becomes
particularly difficult when both the loss and regularizer are nonsmooth.
Existing approaches either have slow or unclear convergence properties, are
restricted to limited problem subclasses, or require careful setting of a
smoothing parameter. In this paper, we propose a continuation algorithm that is
applicable to a large class of nonsmooth regularized risk minimization
problems, can be flexibly used with a number of existing solvers for the
underlying smoothed subproblem, and with convergence results on the whole
algorithm rather than just one of its subproblems. In particular, when
accelerated solvers are used, the proposed algorithm achieves the fastest known
rates of on strongly convex problems, and on general convex
problems. Experiments on nonsmooth classification and regression tasks
demonstrate that the proposed algorithm outperforms the state-of-the-art.Comment: AAAI-201
Stochastic Optimization with Variance Reduction for Infinite Datasets with Finite-Sum Structure
Stochastic optimization algorithms with variance reduction have proven
successful for minimizing large finite sums of functions. Unfortunately, these
techniques are unable to deal with stochastic perturbations of input data,
induced for example by data augmentation. In such cases, the objective is no
longer a finite sum, and the main candidate for optimization is the stochastic
gradient descent method (SGD). In this paper, we introduce a variance reduction
approach for these settings when the objective is composite and strongly
convex. The convergence rate outperforms SGD with a typically much smaller
constant factor, which depends on the variance of gradient estimates only due
to perturbations on a single example.Comment: Advances in Neural Information Processing Systems (NIPS), Dec 2017,
Long Beach, CA, United State
- …