147 research outputs found
Nonconvex Stochastic Bregman Proximal Gradient Method with Application to Deep Learning
The widely used stochastic gradient methods for minimizing nonconvex
composite objective functions require the Lipschitz smoothness of the
differentiable part. But the requirement does not hold true for problem classes
including quadratic inverse problems and training neural networks. To address
this issue, we investigate a family of stochastic Bregman proximal gradient
(SBPG) methods, which only require smooth adaptivity of the differentiable
part. SBPG replaces the upper quadratic approximation used in SGD with the
Bregman proximity measure, resulting in a better approximation model that
captures the non-Lipschitz gradients of the nonconvex objective. We formulate
the vanilla SBPG and establish its convergence properties under nonconvex
setting without finite-sum structure. Experimental results on quadratic inverse
problems testify the robustness of SBPG. Moreover, we propose a momentum-based
version of SBPG (MSBPG) and prove it has improved convergence properties. We
apply MSBPG to the training of deep neural networks with a polynomial kernel
function, which ensures the smooth adaptivity of the loss function.
Experimental results on representative benchmarks demonstrate the effectiveness
and robustness of MSBPG in training neural networks. Since the additional
computation cost of MSBPG compared with SGD is negligible in large-scale
optimization, MSBPG can potentially be employed as an universal open-source
optimizer in the future.Comment: 37 page
Catalyst Acceleration for Gradient-Based Non-Convex Optimization
We introduce a generic scheme to solve nonconvex optimization problems using
gradient-based algorithms originally designed for minimizing convex functions.
Even though these methods may originally require convexity to operate, the
proposed approach allows one to use them on weakly convex objectives, which
covers a large class of non-convex functions typically appearing in machine
learning and signal processing. In general, the scheme is guaranteed to produce
a stationary point with a worst-case efficiency typical of first-order methods,
and when the objective turns out to be convex, it automatically accelerates in
the sense of Nesterov and achieves near-optimal convergence rate in function
values. These properties are achieved without assuming any knowledge about the
convexity of the objective, by automatically adapting to the unknown weak
convexity constant. We conclude the paper by showing promising experimental
results obtained by applying our approach to incremental algorithms such as
SVRG and SAGA for sparse matrix factorization and for learning neural networks
Differentially Private Accelerated Optimization Algorithms
We present two classes of differentially private optimization algorithms
derived from the well-known accelerated first-order methods. The first
algorithm is inspired by Polyak's heavy ball method and employs a smoothing
approach to decrease the accumulated noise on the gradient steps required for
differential privacy. The second class of algorithms are based on Nesterov's
accelerated gradient method and its recent multi-stage variant. We propose a
noise dividing mechanism for the iterations of Nesterov's method in order to
improve the error behavior of the algorithm. The convergence rate analyses are
provided for both the heavy ball and the Nesterov's accelerated gradient method
with the help of the dynamical system analysis techniques. Finally, we conclude
with our numerical experiments showing that the presented algorithms have
advantages over the well-known differentially private algorithms.Comment: 28 pages, 4 figure
- …