28 research outputs found
A Simple Proximal Stochastic Gradient Method for Nonsmooth Nonconvex Optimization
We analyze stochastic gradient algorithms for optimizing nonconvex, nonsmooth
finite-sum problems. In particular, the objective function is given by the
summation of a differentiable (possibly nonconvex) component, together with a
possibly non-differentiable but convex component. We propose a proximal
stochastic gradient algorithm based on variance reduction, called ProxSVRG+.
Our main contribution lies in the analysis of ProxSVRG+. It recovers several
existing convergence results and improves/generalizes them (in terms of the
number of stochastic gradient oracle calls and proximal oracle calls). In
particular, ProxSVRG+ generalizes the best results given by the SCSG algorithm,
recently proposed by [Lei et al., 2017] for the smooth nonconvex case.
ProxSVRG+ is also more straightforward than SCSG and yields simpler analysis.
Moreover, ProxSVRG+ outperforms the deterministic proximal gradient descent
(ProxGD) for a wide range of minibatch sizes, which partially solves an open
problem proposed in [Reddi et al., 2016b]. Also, ProxSVRG+ uses much less
proximal oracle calls than ProxSVRG [Reddi et al., 2016b]. Moreover, for
nonconvex functions satisfied Polyak-\L{}ojasiewicz condition, we prove that
ProxSVRG+ achieves a global linear convergence rate without restart unlike
ProxSVRG. Thus, it can \emph{automatically} switch to the faster linear
convergence in some regions as long as the objective function satisfies the PL
condition locally in these regions. ProxSVRG+ also improves ProxGD and
ProxSVRG/SAGA, and generalizes the results of SCSG in this case. Finally, we
conduct several experiments and the experimental results are consistent with
the theoretical results.Comment: 32nd Conference on Neural Information Processing Systems (NeurIPS
2018
Proxy Convexity: A Unified Framework for the Analysis of Neural Networks Trained by Gradient Descent
Although the optimization objectives for learning neural networks are highly
non-convex, gradient-based methods have been wildly successful at learning
neural networks in practice. This juxtaposition has led to a number of recent
studies on provable guarantees for neural networks trained by gradient descent.
Unfortunately, the techniques in these works are often highly specific to the
problem studied in each setting, relying on different assumptions on the
distribution, optimization parameters, and network architectures, making it
difficult to generalize across different settings. In this work, we propose a
unified non-convex optimization framework for the analysis of neural network
training. We introduce the notions of proxy convexity and proxy
Polyak-Lojasiewicz (PL) inequalities, which are satisfied if the original
objective function induces a proxy objective function that is implicitly
minimized when using gradient methods. We show that stochastic gradient descent
(SGD) on objectives satisfying proxy convexity or the proxy PL inequality leads
to efficient guarantees for proxy objective functions. We further show that
many existing guarantees for neural networks trained by gradient descent can be
unified through proxy convexity and proxy PL inequalities.Comment: 15 page