9 research outputs found
Bias-Variance Tradeoff in a Sliding Window Implementation of the Stochastic Gradient Algorithm
This paper provides a framework to analyze stochastic gradient algorithms in
a mean squared error (MSE) sense using the asymptotic normality result of the
stochastic gradient descent (SGD) iterates. We perform this analysis by taking
the asymptotic normality result and applying it to the finite iteration case.
Specifically, we look at problems where the gradient estimators are biased and
have reduced variance and compare the iterates generated by these gradient
estimators to the iterates generated by the SGD algorithm. We use the work of
Fabian to characterize the mean and the variance of the distribution of the
iterates in terms of the bias and the covariance matrix of the gradient
estimators. We introduce the sliding window SGD (SW-SGD) algorithm, with its
proof of convergence, which incurs a lower MSE than the SGD algorithm on
quadratic and convex problems. Lastly, we present some numerical results to
show the effectiveness of this framework and the superiority of SW-SGD
algorithm over the SGD algorithm
Differentially Private Accelerated Optimization Algorithms
We present two classes of differentially private optimization algorithms
derived from the well-known accelerated first-order methods. The first
algorithm is inspired by Polyak's heavy ball method and employs a smoothing
approach to decrease the accumulated noise on the gradient steps required for
differential privacy. The second class of algorithms are based on Nesterov's
accelerated gradient method and its recent multi-stage variant. We propose a
noise dividing mechanism for the iterations of Nesterov's method in order to
improve the error behavior of the algorithm. The convergence rate analyses are
provided for both the heavy ball and the Nesterov's accelerated gradient method
with the help of the dynamical system analysis techniques. Finally, we conclude
with our numerical experiments showing that the presented algorithms have
advantages over the well-known differentially private algorithms.Comment: 28 pages, 4 figure