3,636 research outputs found
Speculative Approximations for Terascale Analytics
Model calibration is a major challenge faced by the plethora of statistical
analytics packages that are increasingly used in Big Data applications.
Identifying the optimal model parameters is a time-consuming process that has
to be executed from scratch for every dataset/model combination even by
experienced data scientists. We argue that the incapacity to evaluate multiple
parameter configurations simultaneously and the lack of support to quickly
identify sub-optimal configurations are the principal causes. In this paper, we
develop two database-inspired techniques for efficient model calibration.
Speculative parameter testing applies advanced parallel multi-query processing
methods to evaluate several configurations concurrently. The number of
configurations is determined adaptively at runtime, while the configurations
themselves are extracted from a distribution that is continuously learned
following a Bayesian process. Online aggregation is applied to identify
sub-optimal configurations early in the processing by incrementally sampling
the training dataset and estimating the objective function corresponding to
each configuration. We design concurrent online aggregation estimators and
define halting conditions to accurately and timely stop the execution. We apply
the proposed techniques to distributed gradient descent optimization -- batch
and incremental -- for support vector machines and logistic regression models.
We implement the resulting solutions in GLADE PF-OLA -- a state-of-the-art Big
Data analytics system -- and evaluate their performance over terascale-size
synthetic and real datasets. The results confirm that as many as 32
configurations can be evaluated concurrently almost as fast as one, while
sub-optimal configurations are detected accurately in as little as a
fraction of the time
Breaking the Nonsmooth Barrier: A Scalable Parallel Method for Composite Optimization
Due to their simplicity and excellent performance, parallel asynchronous
variants of stochastic gradient descent have become popular methods to solve a
wide range of large-scale optimization problems on multi-core architectures.
Yet, despite their practical success, support for nonsmooth objectives is still
lacking, making them unsuitable for many problems of interest in machine
learning, such as the Lasso, group Lasso or empirical risk minimization with
convex constraints.
In this work, we propose and analyze ProxASAGA, a fully asynchronous sparse
method inspired by SAGA, a variance reduced incremental gradient algorithm. The
proposed method is easy to implement and significantly outperforms the state of
the art on several nonsmooth, large-scale problems. We prove that our method
achieves a theoretical linear speedup with respect to the sequential version
under assumptions on the sparsity of gradients and block-separability of the
proximal term. Empirical benchmarks on a multi-core architecture illustrate
practical speedups of up to 12x on a 20-core machine.Comment: Appears in Advances in Neural Information Processing Systems 30 (NIPS
2017), 28 page
Parallel Successive Convex Approximation for Nonsmooth Nonconvex Optimization
Consider the problem of minimizing the sum of a smooth (possibly non-convex)
and a convex (possibly nonsmooth) function involving a large number of
variables. A popular approach to solve this problem is the block coordinate
descent (BCD) method whereby at each iteration only one variable block is
updated while the remaining variables are held fixed. With the recent advances
in the developments of the multi-core parallel processing technology, it is
desirable to parallelize the BCD method by allowing multiple blocks to be
updated simultaneously at each iteration of the algorithm. In this work, we
propose an inexact parallel BCD approach where at each iteration, a subset of
the variables is updated in parallel by minimizing convex approximations of the
original objective function. We investigate the convergence of this parallel
BCD method for both randomized and cyclic variable selection rules. We analyze
the asymptotic and non-asymptotic convergence behavior of the algorithm for
both convex and non-convex objective functions. The numerical experiments
suggest that for a special case of Lasso minimization problem, the cyclic block
selection rule can outperform the randomized rule
- …