319 research outputs found
Convergence of the Forward-Backward Algorithm: Beyond the Worst Case with the Help of Geometry
We provide a comprehensive study of the convergence of forward-backward
algorithm under suitable geometric conditions leading to fast rates. We present
several new results and collect in a unified view a variety of results
scattered in the literature, often providing simplified proofs. Novel
contributions include the analysis of infinite dimensional convex minimization
problems, allowing the case where minimizers might not exist. Further, we
analyze the relation between different geometric conditions, and discuss novel
connections with a priori conditions in linear inverse problems, including
source conditions, restricted isometry properties and partial smoothness
A Simple Proximal Stochastic Gradient Method for Nonsmooth Nonconvex Optimization
We analyze stochastic gradient algorithms for optimizing nonconvex, nonsmooth
finite-sum problems. In particular, the objective function is given by the
summation of a differentiable (possibly nonconvex) component, together with a
possibly non-differentiable but convex component. We propose a proximal
stochastic gradient algorithm based on variance reduction, called ProxSVRG+.
Our main contribution lies in the analysis of ProxSVRG+. It recovers several
existing convergence results and improves/generalizes them (in terms of the
number of stochastic gradient oracle calls and proximal oracle calls). In
particular, ProxSVRG+ generalizes the best results given by the SCSG algorithm,
recently proposed by [Lei et al., 2017] for the smooth nonconvex case.
ProxSVRG+ is also more straightforward than SCSG and yields simpler analysis.
Moreover, ProxSVRG+ outperforms the deterministic proximal gradient descent
(ProxGD) for a wide range of minibatch sizes, which partially solves an open
problem proposed in [Reddi et al., 2016b]. Also, ProxSVRG+ uses much less
proximal oracle calls than ProxSVRG [Reddi et al., 2016b]. Moreover, for
nonconvex functions satisfied Polyak-\L{}ojasiewicz condition, we prove that
ProxSVRG+ achieves a global linear convergence rate without restart unlike
ProxSVRG. Thus, it can \emph{automatically} switch to the faster linear
convergence in some regions as long as the objective function satisfies the PL
condition locally in these regions. ProxSVRG+ also improves ProxGD and
ProxSVRG/SAGA, and generalizes the results of SCSG in this case. Finally, we
conduct several experiments and the experimental results are consistent with
the theoretical results.Comment: 32nd Conference on Neural Information Processing Systems (NeurIPS
2018
Convergence Rates of Stochastic Zeroth-order Gradient Descent for \L ojasiewicz Functions
We prove convergence rates of Stochastic Zeroth-order Gradient Descent (SZGD)
algorithms for Lojasiewicz functions. The SZGD algorithm iterates as
\begin{align*}
\mathbf{x}_{t+1} = \mathbf{x}_t - \eta_t \widehat{\nabla} f (\mathbf{x}_t),
\qquad t = 0,1,2,3,\cdots , \end{align*} where is the objective function
that satisfies the \L ojasiewicz inequality with \L ojasiewicz exponent
, is the step size (learning rate), and is the approximate gradient estimated using zeroth-order
information only.
Our results show that can converge faster than , regardless of whether the
objective is smooth or nonsmooth
Variance reduction techniques for stochastic proximal point algorithms
In the context of finite sums minimization, variance reduction techniques are
widely used to improve the performance of state-of-the-art stochastic gradient
methods. Their practical impact is clear, as well as their theoretical
properties. Stochastic proximal point algorithms have been studied as an
alternative to stochastic gradient algorithms since they are more stable with
respect to the choice of the stepsize but a proper variance reduced version is
missing. In this work, we propose the first study of variance reduction
techniques for stochastic proximal point algorithms. We introduce a stochastic
proximal version of SVRG, SAGA, and some of their variants for smooth and
convex functions. We provide several convergence results for the iterates and
the objective function values. In addition, under the Polyak-{\L}ojasiewicz
(PL) condition, we obtain linear convergence rates for the iterates and the
function values. Our numerical experiments demonstrate the advantages of the
proximal variance reduction methods over their gradient counterparts,
especially about the stability with respect to the choice of the step size
- …