184,113 research outputs found
Generalization bounds for averaged classifiers
We study a simple learning algorithm for binary classification. Instead of
predicting with the best hypothesis in the hypothesis class, that is, the
hypothesis that minimizes the training error, our algorithm predicts with a
weighted average of all hypotheses, weighted exponentially with respect to
their training error. We show that the prediction of this algorithm is much
more stable than the prediction of an algorithm that predicts with the best
hypothesis. By allowing the algorithm to abstain from predicting on some
examples, we show that the predictions it makes when it does not abstain are
very reliable. Finally, we show that the probability that the algorithm
abstains is comparable to the generalization error of the best hypothesis in
the class.Comment: Published by the Institute of Mathematical Statistics
(http://www.imstat.org) in the Annals of Statistics
(http://www.imstat.org/aos/) at http://dx.doi.org/10.1214/00905360400000005
Generalization Bounds for Representative Domain Adaptation
In this paper, we propose a novel framework to analyze the theoretical
properties of the learning process for a representative type of domain
adaptation, which combines data from multiple sources and one target (or
briefly called representative domain adaptation). In particular, we use the
integral probability metric to measure the difference between the distributions
of two domains and meanwhile compare it with the H-divergence and the
discrepancy distance. We develop the Hoeffding-type, the Bennett-type and the
McDiarmid-type deviation inequalities for multiple domains respectively, and
then present the symmetrization inequality for representative domain
adaptation. Next, we use the derived inequalities to obtain the Hoeffding-type
and the Bennett-type generalization bounds respectively, both of which are
based on the uniform entropy number. Moreover, we present the generalization
bounds based on the Rademacher complexity. Finally, we analyze the asymptotic
convergence and the rate of convergence of the learning process for
representative domain adaptation. We discuss the factors that affect the
asymptotic behavior of the learning process and the numerical experiments
support our theoretical findings as well. Meanwhile, we give a comparison with
the existing results of domain adaptation and the classical results under the
same-distribution assumption.Comment: arXiv admin note: substantial text overlap with arXiv:1304.157
Improved Generalization Bounds for Robust Learning
We consider a model of robust learning in an adversarial environment. The
learner gets uncorrupted training data with access to possible corruptions that
may be affected by the adversary during testing. The learner's goal is to build
a robust classifier that would be tested on future adversarial examples. We use
a zero-sum game between the learner and the adversary as our game theoretic
framework. The adversary is limited to possible corruptions for each input.
Our model is closely related to the adversarial examples model of Schmidt et
al. (2018); Madry et al. (2017).
Our main results consist of generalization bounds for the binary and
multi-class classification, as well as the real-valued case (regression). For
the binary classification setting, we both tighten the generalization bound of
Feige, Mansour, and Schapire (2015), and also are able to handle an infinite
hypothesis class . The sample complexity is improved from
to
. Additionally, we
extend the algorithm and generalization bound from the binary to the multiclass
and real-valued cases. Along the way, we obtain results on fat-shattering
dimension and Rademacher complexity of -fold maxima over function classes;
these may be of independent interest.
For binary classification, the algorithm of Feige et al. (2015) uses a regret
minimization algorithm and an ERM oracle as a blackbox; we adapt it for the
multi-class and regression settings. The algorithm provides us with
near-optimal policies for the players on a given training sample.Comment: Appearing at the 30th International Conference on Algorithmic
Learning Theory (ALT 2019
Data-Dependent Stability of Stochastic Gradient Descent
We establish a data-dependent notion of algorithmic stability for Stochastic
Gradient Descent (SGD), and employ it to develop novel generalization bounds.
This is in contrast to previous distribution-free algorithmic stability results
for SGD which depend on the worst-case constants. By virtue of the
data-dependent argument, our bounds provide new insights into learning with SGD
on convex and non-convex problems. In the convex case, we show that the bound
on the generalization error depends on the risk at the initialization point. In
the non-convex case, we prove that the expected curvature of the objective
function around the initialization point has crucial influence on the
generalization error. In both cases, our results suggest a simple data-driven
strategy to stabilize SGD by pre-screening its initialization. As a corollary,
our results allow us to show optimistic generalization bounds that exhibit fast
convergence rates for SGD subject to a vanishing empirical risk and low noise
of stochastic gradient
- …