In this paper, we propose SGEM, Stochastic Gradient with Energy and Momentum,
to solve a large class of general non-convex stochastic optimization problems,
based on the AEGD method that originated in the work [AEGD: Adaptive Gradient
Descent with Energy. arXiv: 2010.05109]. SGEM incorporates both energy and
momentum at the same time so as to inherit their dual advantages. We show that
SGEM features an unconditional energy stability property, and derive
energy-dependent convergence rates in the general nonconvex stochastic setting,
as well as a regret bound in the online convex setting. A lower threshold for
the energy variable is also provided. Our experimental results show that SGEM
converges faster than AEGD and generalizes better or at least as well as SGDM
in training some deep neural networks.Comment: 24 pages, 4 figure