2 research outputs found
On the Stability and Convergence of Stochastic Gradient Descent with Momentum
While momentum-based methods, in conjunction with the stochastic gradient
descent, are widely used when training machine learning models, there is little
theoretical understanding on the generalization error of such methods. In
practice, the momentum parameter is often chosen in a heuristic fashion with
little theoretical guidance. In the first part of this paper, for the case of
general loss functions, we analyze a modified momentum-based update rule, i.e.,
the method of early momentum, and develop an upper-bound on the generalization
error using the framework of algorithmic stability. Our results show that
machine learning models can be trained for multiple epochs of this method while
their generalization errors are bounded. We also study the convergence of the
method of early momentum by establishing an upper-bound on the expected norm of
the gradient. In the second part of the paper, we focus on the case of strongly
convex loss functions and the classical heavy-ball momentum update rule. We use
the framework of algorithmic stability to provide an upper-bound on the
generalization error of the stochastic gradient method with momentum. We also
develop an upper-bound on the expected true risk, in terms of the number of
training steps, the size of the training set, and the momentum parameter.
Experimental evaluations verify the consistency between the numerical results
and our theoretical bounds and the effectiveness of the method of early
momentum for the case of non-convex loss functions
Differentially Private Accelerated Optimization Algorithms
We present two classes of differentially private optimization algorithms
derived from the well-known accelerated first-order methods. The first
algorithm is inspired by Polyak's heavy ball method and employs a smoothing
approach to decrease the accumulated noise on the gradient steps required for
differential privacy. The second class of algorithms are based on Nesterov's
accelerated gradient method and its recent multi-stage variant. We propose a
noise dividing mechanism for the iterations of Nesterov's method in order to
improve the error behavior of the algorithm. The convergence rate analyses are
provided for both the heavy ball and the Nesterov's accelerated gradient method
with the help of the dynamical system analysis techniques. Finally, we conclude
with our numerical experiments showing that the presented algorithms have
advantages over the well-known differentially private algorithms.Comment: 28 pages, 4 figure