6,084 research outputs found
Accelerated Proximal Stochastic Dual Coordinate Ascent for Regularized Loss Minimization
We introduce a proximal version of the stochastic dual coordinate ascent
method and show how to accelerate the method using an inner-outer iteration
procedure. We analyze the runtime of the framework and obtain rates that
improve state-of-the-art results for various key machine learning optimization
problems including SVM, logistic regression, ridge regression, Lasso, and
multiclass SVM. Experiments validate our theoretical findings
Stochastic Optimization with Importance Sampling
Uniform sampling of training data has been commonly used in traditional
stochastic optimization algorithms such as Proximal Stochastic Gradient Descent
(prox-SGD) and Proximal Stochastic Dual Coordinate Ascent (prox-SDCA). Although
uniform sampling can guarantee that the sampled stochastic quantity is an
unbiased estimate of the corresponding true quantity, the resulting estimator
may have a rather high variance, which negatively affects the convergence of
the underlying optimization procedure. In this paper we study stochastic
optimization with importance sampling, which improves the convergence rate by
reducing the stochastic variance. Specifically, we study prox-SGD (actually,
stochastic mirror descent) with importance sampling and prox-SDCA with
importance sampling. For prox-SGD, instead of adopting uniform sampling
throughout the training process, the proposed algorithm employs importance
sampling to minimize the variance of the stochastic gradient. For prox-SDCA,
the proposed importance sampling scheme aims to achieve higher expected dual
value at each dual coordinate ascent step. We provide extensive theoretical
analysis to show that the convergence rates with the proposed importance
sampling methods can be significantly improved under suitable conditions both
for prox-SGD and for prox-SDCA. Experiments are provided to verify the
theoretical analysis.Comment: 29 page
Stochastic Primal-Dual Coordinate Method for Regularized Empirical Risk Minimization
We consider a generic convex optimization problem associated with regularized
empirical risk minimization of linear predictors. The problem structure allows
us to reformulate it as a convex-concave saddle point problem. We propose a
stochastic primal-dual coordinate (SPDC) method, which alternates between
maximizing over a randomly chosen dual variable and minimizing over the primal
variable. An extrapolation step on the primal variable is performed to obtain
accelerated convergence rate. We also develop a mini-batch version of the SPDC
method which facilitates parallel computing, and an extension with weighted
sampling probabilities on the dual variables, which has a better complexity
than uniform sampling on unnormalized data. Both theoretically and empirically,
we show that the SPDC method has comparable or better performance than several
state-of-the-art optimization methods
SCOPE: Scalable Composite Optimization for Learning on Spark
Many machine learning models, such as logistic regression~(LR) and support
vector machine~(SVM), can be formulated as composite optimization problems.
Recently, many distributed stochastic optimization~(DSO) methods have been
proposed to solve the large-scale composite optimization problems, which have
shown better performance than traditional batch methods. However, most of these
DSO methods are not scalable enough. In this paper, we propose a novel DSO
method, called \underline{s}calable \underline{c}omposite
\underline{op}timization for l\underline{e}arning~({SCOPE}), and implement it
on the fault-tolerant distributed platform \mbox{Spark}. SCOPE is both
computation-efficient and communication-efficient. Theoretical analysis shows
that SCOPE is convergent with linear convergence rate when the objective
function is convex. Furthermore, empirical results on real datasets show that
SCOPE can outperform other state-of-the-art distributed learning methods on
Spark, including both batch learning methods and DSO methods
- …