19,860 research outputs found
Stability and Generalization of the Decentralized Stochastic Gradient Descent
The stability and generalization of stochastic gradient-based methods provide
valuable insights into understanding the algorithmic performance of machine
learning models. As the main workhorse for deep learning, stochastic gradient
descent has received a considerable amount of studies. Nevertheless, the
community paid little attention to its decentralized variants. In this paper,
we provide a novel formulation of the decentralized stochastic gradient
descent. Leveraging this formulation together with (non)convex optimization
theory, we establish the first stability and generalization guarantees for the
decentralized stochastic gradient descent. Our theoretical results are built on
top of a few common and mild assumptions and reveal that the decentralization
deteriorates the stability of SGD for the first time. We verify our theoretical
findings by using a variety of decentralized settings and benchmark machine
learning models
Algorithmic stability and hypothesis complexity
© 2017 by the author(s). We introduce a notion of algorithmic stability of learning algorithms-that we term argument stability-that captures stability of the hypothesis output by the learning algorithm in the normed space of functions from which hypotheses are selected. The main result of the paper bounds the generalization error of any learning algorithm in terms of its argument stability. The bounds are based on martingale inequalities in the Banach space to which the hypotheses belong. We apply the general bounds to bound the performance of some learning algorithms based on empirical risk minimization and stochastic gradient descent
Data-Dependent Stability of Stochastic Gradient Descent
We establish a data-dependent notion of algorithmic stability for Stochastic
Gradient Descent (SGD), and employ it to develop novel generalization bounds.
This is in contrast to previous distribution-free algorithmic stability results
for SGD which depend on the worst-case constants. By virtue of the
data-dependent argument, our bounds provide new insights into learning with SGD
on convex and non-convex problems. In the convex case, we show that the bound
on the generalization error depends on the risk at the initialization point. In
the non-convex case, we prove that the expected curvature of the objective
function around the initialization point has crucial influence on the
generalization error. In both cases, our results suggest a simple data-driven
strategy to stabilize SGD by pre-screening its initialization. As a corollary,
our results allow us to show optimistic generalization bounds that exhibit fast
convergence rates for SGD subject to a vanishing empirical risk and low noise
of stochastic gradient
- …