9,250 research outputs found
Stochastic Optimization with Variance Reduction for Infinite Datasets with Finite-Sum Structure
Stochastic optimization algorithms with variance reduction have proven
successful for minimizing large finite sums of functions. Unfortunately, these
techniques are unable to deal with stochastic perturbations of input data,
induced for example by data augmentation. In such cases, the objective is no
longer a finite sum, and the main candidate for optimization is the stochastic
gradient descent method (SGD). In this paper, we introduce a variance reduction
approach for these settings when the objective is composite and strongly
convex. The convergence rate outperforms SGD with a typically much smaller
constant factor, which depends on the variance of gradient estimates only due
to perturbations on a single example.Comment: Advances in Neural Information Processing Systems (NIPS), Dec 2017,
Long Beach, CA, United State
Cyclic Block Coordinate Descent With Variance Reduction for Composite Nonconvex Optimization
Nonconvex optimization is central in solving many machine learning problems,
in which block-wise structure is commonly encountered. In this work, we propose
cyclic block coordinate methods for nonconvex optimization problems with
non-asymptotic gradient norm guarantees. Our convergence analysis is based on a
gradient Lipschitz condition with respect to a Mahalanobis norm, inspired by a
recent progress on cyclic block coordinate methods. In deterministic settings,
our convergence guarantee matches the guarantee of (full-gradient) gradient
descent, but with the gradient Lipschitz constant being defined w.r.t.~the
Mahalanobis norm. In stochastic settings, we use recursive variance reduction
to decrease the per-iteration cost and match the arithmetic operation
complexity of current optimal stochastic full-gradient methods, with a unified
analysis for both finite-sum and infinite-sum cases. We further prove the
faster, linear convergence of our methods when a Polyak-{\L}ojasiewicz (P{\L})
condition holds for the objective function. To the best of our knowledge, our
work is the first to provide variance-reduced convergence guarantees for a
cyclic block coordinate method. Our experimental results demonstrate the
efficacy of the proposed variance-reduced cyclic scheme in training deep neural
nets
- …