Uniform sampling of training data has been commonly used in traditional
stochastic optimization algorithms such as Proximal Stochastic Gradient Descent
(prox-SGD) and Proximal Stochastic Dual Coordinate Ascent (prox-SDCA). Although
uniform sampling can guarantee that the sampled stochastic quantity is an
unbiased estimate of the corresponding true quantity, the resulting estimator
may have a rather high variance, which negatively affects the convergence of
the underlying optimization procedure. In this paper we study stochastic
optimization with importance sampling, which improves the convergence rate by
reducing the stochastic variance. Specifically, we study prox-SGD (actually,
stochastic mirror descent) with importance sampling and prox-SDCA with
importance sampling. For prox-SGD, instead of adopting uniform sampling
throughout the training process, the proposed algorithm employs importance
sampling to minimize the variance of the stochastic gradient. For prox-SDCA,
the proposed importance sampling scheme aims to achieve higher expected dual
value at each dual coordinate ascent step. We provide extensive theoretical
analysis to show that the convergence rates with the proposed importance
sampling methods can be significantly improved under suitable conditions both
for prox-SGD and for prox-SDCA. Experiments are provided to verify the
theoretical analysis.Comment: 29 page