101 research outputs found
Convergence of Unregularized Online Learning Algorithms
In this paper we study the convergence of online gradient descent algorithms
in reproducing kernel Hilbert spaces (RKHSs) without regularization. We
establish a sufficient condition and a necessary condition for the convergence
of excess generalization errors in expectation. A sufficient condition for the
almost sure convergence is also given. With high probability, we provide
explicit convergence rates of the excess generalization errors for both
averaged iterates and the last iterate, which in turn also imply convergence
rates with probability one. To our best knowledge, this is the first
high-probability convergence rate for the last iterate of online gradient
descent algorithms without strong convexity. Without any boundedness
assumptions on iterates, our results are derived by a novel use of two measures
of the algorithm's one-step progress, respectively by generalization errors and
by distances in RKHSs, where the variances of the involved martingales are
cancelled out by the descent property of the algorithm
Convergence of Online Mirror Descent
In this paper we consider online mirror descent (OMD) algorithms, a class of
scalable online learning algorithms exploiting data geometric structures
through mirror maps. Necessary and sufficient conditions are presented in terms
of the step size sequence for the convergence of an OMD
algorithm with respect to the expected Bregman distance induced by the mirror
map. The condition is in the case of positive variances. It is
reduced to in the case of zero variances for
which the linear convergence may be achieved by taking a constant step size
sequence. A sufficient condition on the almost sure convergence is also given.
We establish tight error bounds under mild conditions on the mirror map, the
loss function, and the regularizer. Our results are achieved by some novel
analysis on the one-step progress of the OMD algorithm using smoothness and
strong convexity of the mirror map and the loss function.Comment: Published in Applied and Computational Harmonic Analysis, 202
Multi-class SVMs: From Tighter Data-Dependent Generalization Bounds to Novel Algorithms
This paper studies the generalization performance of multi-class
classification algorithms, for which we obtain, for the first time, a
data-dependent generalization error bound with a logarithmic dependence on the
class size, substantially improving the state-of-the-art linear dependence in
the existing data-dependent generalization analysis. The theoretical analysis
motivates us to introduce a new multi-class classification machine based on
-norm regularization, where the parameter controls the complexity
of the corresponding bounds. We derive an efficient optimization algorithm
based on Fenchel duality theory. Benchmarks on several real-world datasets show
that the proposed algorithm can achieve significant accuracy gains over the
state of the art
Fine-Grained Analysis of Stability and Generalization for Stochastic Gradient Descent
Recently there are a considerable amount of work devoted to the study of the
algorithmic stability and generalization for stochastic gradient descent (SGD).
However, the existing stability analysis requires to impose restrictive
assumptions on the boundedness of gradients, strong smoothness and convexity of
loss functions. In this paper, we provide a fine-grained analysis of stability
and generalization for SGD by substantially relaxing these assumptions.
Firstly, we establish stability and generalization for SGD by removing the
existing bounded gradient assumptions. The key idea is the introduction of a
new stability measure called on-average model stability, for which we develop
novel bounds controlled by the risks of SGD iterates. This yields
generalization bounds depending on the behavior of the best model, and leads to
the first-ever-known fast bounds in the low-noise setting using stability
approach. Secondly, the smoothness assumption is relaxed by considering loss
functions with Holder continuous (sub)gradients for which we show that optimal
bounds are still achieved by balancing computation and stability. To our best
knowledge, this gives the first-ever-known stability and generalization bounds
for SGD with even non-differentiable loss functions. Finally, we study learning
problems with (strongly) convex objectives but non-convex loss functions.Comment: to appear in ICML 202
- β¦