864 research outputs found
Mini-batch Stochastic Approximation Methods for Nonconvex Stochastic Composite Optimization
This paper considers a class of constrained stochastic composite optimization
problems whose objective function is given by the summation of a differentiable
(possibly nonconvex) component, together with a certain non-differentiable (but
convex) component. In order to solve these problems, we propose a randomized
stochastic projected gradient (RSPG) algorithm, in which proper mini-batch of
samples are taken at each iteration depending on the total budget of stochastic
samples allowed. The RSPG algorithm also employs a general distance function to
allow taking advantage of the geometry of the feasible region. Complexity of
this algorithm is established in a unified setting, which shows nearly optimal
complexity of the algorithm for convex stochastic programming. A
post-optimization phase is also proposed to significantly reduce the variance
of the solutions returned by the algorithm. In addition, based on the RSPG
algorithm, a stochastic gradient free algorithm, which only uses the stochastic
zeroth-order information, has been also discussed. Some preliminary numerical
results are also provided.Comment: 32 page
Stochastic Nonconvex Optimization with Large Minibatches
We study stochastic optimization of nonconvex loss functions, which are
typical objectives for training neural networks. We propose stochastic
approximation algorithms which optimize a series of regularized, nonlinearized
losses on large minibatches of samples, using only first-order gradient
information. Our algorithms provably converge to an approximate critical point
of the expected objective with faster rates than minibatch stochastic gradient
descent, and facilitate better parallelization by allowing larger minibatches.Comment: Accepted by the ALT 201
Stochastic Variance-Reduced ADMM
The alternating direction method of multipliers (ADMM) is a powerful
optimization solver in machine learning. Recently, stochastic ADMM has been
integrated with variance reduction methods for stochastic gradient, leading to
SAG-ADMM and SDCA-ADMM that have fast convergence rates and low iteration
complexities. However, their space requirements can still be high. In this
paper, we propose an integration of ADMM with the method of stochastic variance
reduced gradient (SVRG). Unlike another recent integration attempt called
SCAS-ADMM, the proposed algorithm retains the fast convergence benefits of
SAG-ADMM and SDCA-ADMM, but is more advantageous in that its storage
requirement is very low, even independent of the sample size . We also
extend the proposed method for nonconvex problems, and obtain a convergence
rate of . Experimental results demonstrate that it is as fast as
SAG-ADMM and SDCA-ADMM, much faster than SCAS-ADMM, and can be used on much
bigger data sets
Stochastic Quasi-Newton Methods for Nonconvex Stochastic Optimization
In this paper we study stochastic quasi-Newton methods for nonconvex
stochastic optimization, where we assume that noisy information about the
gradients of the objective function is available via a stochastic first-order
oracle (SFO). We propose a general framework for such methods, for which we
prove almost sure convergence to stationary points and analyze its worst-case
iteration complexity. When a randomly chosen iterate is returned as the output
of such an algorithm, we prove that in the worst-case, the SFO-calls complexity
is to ensure that the expectation of the squared norm of the
gradient is smaller than the given accuracy tolerance . We also
propose a specific algorithm, namely a stochastic damped L-BFGS (SdLBFGS)
method, that falls under the proposed framework. {Moreover, we incorporate the
SVRG variance reduction technique into the proposed SdLBFGS method, and analyze
its SFO-calls complexity. Numerical results on a nonconvex binary
classification problem using SVM, and a multiclass classification problem using
neural networks are reported.Comment: published in SIAM Journal on Optimizatio
Mini-Batch Stochastic ADMMs for Nonconvex Nonsmooth Optimization
With the large rising of complex data, the nonconvex models such as nonconvex
loss function and nonconvex regularizer are widely used in machine learning and
pattern recognition. In this paper, we propose a class of mini-batch stochastic
ADMMs (alternating direction method of multipliers) for solving large-scale
nonconvex nonsmooth problems. We prove that, given an appropriate mini-batch
size, the mini-batch stochastic ADMM without variance reduction (VR) technique
is convergent and reaches a convergence rate of to obtain a stationary
point of the nonconvex optimization, where denotes the number of
iterations. Moreover, we extend the mini-batch stochastic gradient method to
both the nonconvex SVRG-ADMM and SAGA-ADMM proposed in our initial manuscript
\cite{huang2016stochastic}, and prove these mini-batch stochastic ADMMs also
reaches the convergence rate of without condition on the mini-batch
size. In particular, we provide a specific parameter selection for step size
of stochastic gradients and penalty parameter of augmented
Lagrangian function. Finally, extensive experimental results on both simulated
and real-world data demonstrate the effectiveness of the proposed algorithms.Comment: We have fixed some errors in the proofs. arXiv admin note: text
overlap with arXiv:1610.0275
Hybrid Stochastic Gradient Descent Algorithms for Stochastic Nonconvex Optimization
We introduce a hybrid stochastic estimator to design stochastic gradient
algorithms for solving stochastic optimization problems. Such a hybrid
estimator is a convex combination of two existing biased and unbiased
estimators and leads to some useful property on its variance. We limit our
consideration to a hybrid SARAH-SGD for nonconvex expectation problems.
However, our idea can be extended to handle a broader class of estimators in
both convex and nonconvex settings. We propose a new single-loop stochastic
gradient descent algorithm that can achieve
-complexity bound
to obtain an -stationary point under smoothness and
-bounded variance assumptions. This complexity is better than
often obtained in state-of-the-art SGDs when
. We also consider different extensions of our
method, including constant and adaptive step-size with single-loop,
double-loop, and mini-batch variants. We compare our algorithms with existing
methods on several datasets using two nonconvex models.Comment: 41 pages and 18 figure
Penalty Methods with Stochastic Approximation for Stochastic Nonlinear Programming
In this paper, we propose a class of penalty methods with stochastic
approximation for solving stochastic nonlinear programming problems. We assume
that only noisy gradients or function values of the objective function are
available via calls to a stochastic first-order or zeroth-order oracle. In each
iteration of the proposed methods, we minimize an exact penalty function which
is nonsmooth and nonconvex with only stochastic first-order or zeroth-order
information available. Stochastic approximation algorithms are presented for
solving this particular subproblem. The worst-case complexity of calls to the
stochastic first-order (or zeroth-order) oracle for the proposed penalty
methods for obtaining an -stochastic critical point is analyzed
Accelerated Gradient Methods for Nonconvex Nonlinear and Stochastic Programming
In this paper, we generalize the well-known Nesterov's accelerated gradient
(AG) method, originally designed for convex smooth optimization, to solve
nonconvex and possibly stochastic optimization problems. We demonstrate that by
properly specifying the stepsize policy, the AG method exhibits the best known
rate of convergence for solving general nonconvex smooth optimization problems
by using first-order information, similarly to the gradient descent method. We
then consider an important class of composite optimization problems and show
that the AG method can solve them uniformly, i.e., by using the same aggressive
stepsize policy as in the convex case, even if the problem turns out to be
nonconvex. We demonstrate that the AG method exhibits an optimal rate of
convergence if the composite problem is convex, and improves the best known
rate of convergence if the problem is nonconvex. Based on the AG method, we
also present new nonconvex stochastic approximation methods and show that they
can improve a few existing rates of convergence for nonconvex stochastic
optimization. To the best of our knowledge, this is the first time that the
convergence of the AG method has been established for solving nonconvex
nonlinear programming in the literature
A Unified Framework for Stochastic Matrix Factorization via Variance Reduction
We propose a unified framework to speed up the existing stochastic matrix
factorization (SMF) algorithms via variance reduction. Our framework is general
and it subsumes several well-known SMF formulations in the literature. We
perform a non-asymptotic convergence analysis of our framework and derive
computational and sample complexities for our algorithm to converge to an
-stationary point in expectation. In addition, extensive experiments
for a wide class of SMF formulations demonstrate that our framework
consistently yields faster convergence and a more accurate output dictionary
vis-\`a-vis state-of-the-art frameworks
Stochastic Variance Reduction for Nonconvex Optimization
We study nonconvex finite-sum problems and analyze stochastic variance
reduced gradient (SVRG) methods for them. SVRG and related methods have
recently surged into prominence for convex optimization given their edge over
stochastic gradient descent (SGD); but their theoretical analysis almost
exclusively assumes convexity. In contrast, we prove non-asymptotic rates of
convergence (to stationary points) of SVRG for nonconvex optimization, and show
that it is provably faster than SGD and gradient descent. We also analyze a
subclass of nonconvex problems on which SVRG attains linear convergence to the
global optimum. We extend our analysis to mini-batch variants of SVRG, showing
(theoretical) linear speedup due to mini-batching in parallel settings.Comment: Minor feedback change
- …