19 research outputs found
Don't be so Monotone: Relaxing Stochastic Line Search in Over-Parameterized Models
Recent works have shown that line search methods can speed up Stochastic
Gradient Descent (SGD) and Adam in modern over-parameterized settings. However,
existing line searches may take steps that are smaller than necessary since
they require a monotone decrease of the (mini-)batch objective function. We
explore nonmonotone line search methods to relax this condition and possibly
accept larger step sizes. Despite the lack of a monotonic decrease, we prove
the same fast rates of convergence as in the monotone case. Our experiments
show that nonmonotone methods improve the speed of convergence and
generalization properties of SGD/Adam even beyond the previous monotone line
searches. We propose a POlyak NOnmonotone Stochastic (PoNoS) method, obtained
by combining a nonmonotone line search with a Polyak initial step size.
Furthermore, we develop a new resetting technique that in the majority of the
iterations reduces the amount of backtracks to zero while still maintaining a
large initial step size. To the best of our knowledge, a first runtime
comparison shows that the epoch-wise advantage of line-search-based methods
gets reflected in the overall computational time
Regularized Newton Method with Global Convergence
We present a Newton-type method that converges fast from any initialization
and for arbitrary convex objectives with Lipschitz Hessians. We achieve this by
merging the ideas of cubic regularization with a certain adaptive
Levenberg--Marquardt penalty. In particular, we show that the iterates given by
, where is a constant, converge
globally with a rate. Our method is the first
variant of Newton's method that has both cheap iterations and provably fast
global convergence. Moreover, we prove that locally our method converges
superlinearly when the objective is strongly convex. To boost the method's
performance, we present a line search procedure that does not need
hyperparameters and is provably efficient.Comment: 21 pages, 2 figure
International Conference on Continuous Optimization (ICCOPT) 2019 Conference Book
The Sixth International Conference on Continuous Optimization took place on the campus of the Technical University of Berlin, August 3-8, 2019. The ICCOPT is a flagship conference of the Mathematical Optimization Society (MOS), organized every three years. ICCOPT 2019 was hosted by the Weierstrass Institute for Applied Analysis and Stochastics (WIAS) Berlin. It included a Summer School and a Conference with a series of plenary and semi-plenary talks, organized and contributed sessions, and poster sessions.
This book comprises the full conference program. It contains, in particular, the scientific program in survey style as well as with all details, and information on the social program, the venue, special meetings, and more
Novel gradient-based methods for data distribution and privacy in data science
With an increase in the need of storing data at different locations, designing algorithms that can analyze distributed data is becoming more important. In this thesis, we present several gradient-based algorithms, which are customized for data distribution and privacy. First, we propose a provably convergent, second order incremental and inherently parallel algorithm. The proposed algorithm works with distributed data. By using a local quadratic approximation, we achieve to speed-up the convergence with the help of curvature information. We also illustrate that the parallel implementation of our algorithm performs better than a parallel stochastic gradient descent method to solve a large-scale data science problem. This first algorithm solves the problem of using data that resides at different locations. However, this setting is not necessarily enough for data privacy. To guarantee the privacy of the data, we propose differentially private optimization algorithms in the second part of the thesis. The first one among them employs a smoothing approach which is based on using the weighted averages of the history of gradients. This approach helps to decrease the variance of the noise. This reduction in the variance is important for iterative optimization algorithms, since increasing the amount of noise in the algorithm can harm the performance. We also present differentially private version of a recent multistage accelerated algorithm. These extensions use noise related parameter selection and the proposed stepsizes are proportional to the variance of the noisy gradient. The numerical experiments show that our algorithms show a better performance than some well-known differentially private algorithm