527 research outputs found
Novel gradient-based methods for data distribution and privacy in data science
With an increase in the need of storing data at different locations, designing algorithms that can analyze distributed data is becoming more important. In this thesis, we present several gradient-based algorithms, which are customized for data distribution and privacy. First, we propose a provably convergent, second order incremental and inherently parallel algorithm. The proposed algorithm works with distributed data. By using a local quadratic approximation, we achieve to speed-up the convergence with the help of curvature information. We also illustrate that the parallel implementation of our algorithm performs better than a parallel stochastic gradient descent method to solve a large-scale data science problem. This first algorithm solves the problem of using data that resides at different locations. However, this setting is not necessarily enough for data privacy. To guarantee the privacy of the data, we propose differentially private optimization algorithms in the second part of the thesis. The first one among them employs a smoothing approach which is based on using the weighted averages of the history of gradients. This approach helps to decrease the variance of the noise. This reduction in the variance is important for iterative optimization algorithms, since increasing the amount of noise in the algorithm can harm the performance. We also present differentially private version of a recent multistage accelerated algorithm. These extensions use noise related parameter selection and the proposed stepsizes are proportional to the variance of the noisy gradient. The numerical experiments show that our algorithms show a better performance than some well-known differentially private algorithm
Differentially Private Accelerated Optimization Algorithms
We present two classes of differentially private optimization algorithms
derived from the well-known accelerated first-order methods. The first
algorithm is inspired by Polyak's heavy ball method and employs a smoothing
approach to decrease the accumulated noise on the gradient steps required for
differential privacy. The second class of algorithms are based on Nesterov's
accelerated gradient method and its recent multi-stage variant. We propose a
noise dividing mechanism for the iterations of Nesterov's method in order to
improve the error behavior of the algorithm. The convergence rate analyses are
provided for both the heavy ball and the Nesterov's accelerated gradient method
with the help of the dynamical system analysis techniques. Finally, we conclude
with our numerical experiments showing that the presented algorithms have
advantages over the well-known differentially private algorithms.Comment: 28 pages, 4 figure
A Second look at Exponential and Cosine Step Sizes: Simplicity, Adaptivity, and Performance
Stochastic Gradient Descent (SGD) is a popular tool in training large-scale
machine learning models. Its performance, however, is highly variable,
depending crucially on the choice of the step sizes. Accordingly, a variety of
strategies for tuning the step sizes have been proposed, ranging from
coordinate-wise approaches (a.k.a. ``adaptive'' step sizes) to sophisticated
heuristics to change the step size in each iteration. In this paper, we study
two step size schedules whose power has been repeatedly confirmed in practice:
the exponential and the cosine step sizes. For the first time, we provide
theoretical support for them proving convergence rates for smooth non-convex
functions, with and without the Polyak-\L{}ojasiewicz (PL) condition. Moreover,
we show the surprising property that these two strategies are \emph{adaptive}
to the noise level in the stochastic gradients of PL functions. That is,
contrary to polynomial step sizes, they achieve almost optimal performance
without needing to know the noise level nor tuning their hyperparameters based
on it. Finally, we conduct a fair and comprehensive empirical evaluation of
real-world datasets with deep learning architectures. Results show that, even
if only requiring at most two hyperparameters to tune, these two strategies
best or match the performance of various finely-tuned state-of-the-art
strategies
- …