5,013 research outputs found
Out-Of-Domain Unlabeled Data Improves Generalization
We propose a novel framework for incorporating unlabeled data into
semi-supervised classification problems, where scenarios involving the
minimization of either i) adversarially robust or ii) non-robust loss functions
have been considered. Notably, we allow the unlabeled samples to deviate
slightly (in total variation sense) from the in-domain distribution. The core
idea behind our framework is to combine Distributionally Robust Optimization
(DRO) with self-supervised training. As a result, we also leverage efficient
polynomial-time algorithms for the training stage. From a theoretical
standpoint, we apply our framework on the classification problem of a mixture
of two Gaussians in , where in addition to the independent
and labeled samples from the true distribution, a set of (usually with
) out of domain and unlabeled samples are given as well. Using only the
labeled data, it is known that the generalization error can be bounded by
. However, using our method on both isotropic
and non-isotropic Gaussian mixture models, one can derive a new set of
analytically explicit and non-asymptotic bounds which show substantial
improvement on the generalization error compared to ERM. Our results underscore
two significant insights: 1) out-of-domain samples, even when unlabeled, can be
harnessed to narrow the generalization gap, provided that the true data
distribution adheres to a form of the ``cluster assumption", and 2) the
semi-supervised learning paradigm can be regarded as a special case of our
framework when there are no distributional shifts. We validate our claims
through experiments conducted on a variety of synthetic and real-world
datasets.Comment: Published at ICLR 2024 (Spotlight), 29 pages, no figure
PAC-Bayes and Domain Adaptation
We provide two main contributions in PAC-Bayesian theory for domain
adaptation where the objective is to learn, from a source distribution, a
well-performing majority vote on a different, but related, target distribution.
Firstly, we propose an improvement of the previous approach we proposed in
Germain et al. (2013), which relies on a novel distribution pseudodistance
based on a disagreement averaging, allowing us to derive a new tighter domain
adaptation bound for the target risk. While this bound stands in the spirit of
common domain adaptation works, we derive a second bound (introduced in Germain
et al., 2016) that brings a new perspective on domain adaptation by deriving an
upper bound on the target risk where the distributions' divergence-expressed as
a ratio-controls the trade-off between a source error measure and the target
voters' disagreement. We discuss and compare both results, from which we obtain
PAC-Bayesian generalization bounds. Furthermore, from the PAC-Bayesian
specialization to linear classifiers, we infer two learning algorithms, and we
evaluate them on real data.Comment: Neurocomputing, Elsevier, 2019. arXiv admin note: substantial text
overlap with arXiv:1503.0694
A Simple Algorithm for Semi-supervised Learning with Improved Generalization Error Bound
In this work, we develop a simple algorithm for semi-supervised regression.
The key idea is to use the top eigenfunctions of integral operator derived from
both labeled and unlabeled examples as the basis functions and learn the
prediction function by a simple linear regression. We show that under
appropriate assumptions about the integral operator, this approach is able to
achieve an improved regression error bound better than existing bounds of
supervised learning. We also verify the effectiveness of the proposed algorithm
by an empirical study.Comment: Appears in Proceedings of the 29th International Conference on
Machine Learning (ICML 2012
Explicit Learning Curves for Transduction and Application to Clustering and Compression Algorithms
Inductive learning is based on inferring a general rule from a finite data
set and using it to label new data. In transduction one attempts to solve the
problem of using a labeled training set to label a set of unlabeled points,
which are given to the learner prior to learning. Although transduction seems
at the outset to be an easier task than induction, there have not been many
provably useful algorithms for transduction. Moreover, the precise relation
between induction and transduction has not yet been determined. The main
theoretical developments related to transduction were presented by Vapnik more
than twenty years ago. One of Vapnik's basic results is a rather tight error
bound for transductive classification based on an exact computation of the
hypergeometric tail. While tight, this bound is given implicitly via a
computational routine. Our first contribution is a somewhat looser but explicit
characterization of a slightly extended PAC-Bayesian version of Vapnik's
transductive bound. This characterization is obtained using concentration
inequalities for the tail of sums of random variables obtained by sampling
without replacement. We then derive error bounds for compression schemes such
as (transductive) support vector machines and for transduction algorithms based
on clustering. The main observation used for deriving these new error bounds
and algorithms is that the unlabeled test points, which in the transductive
setting are known in advance, can be used in order to construct useful data
dependent prior distributions over the hypothesis space
Domain Adaptation: Learning Bounds and Algorithms
This paper addresses the general problem of domain adaptation which arises in
a variety of applications where the distribution of the labeled sample
available somewhat differs from that of the test data. Building on previous
work by Ben-David et al. (2007), we introduce a novel distance between
distributions, discrepancy distance, that is tailored to adaptation problems
with arbitrary loss functions. We give Rademacher complexity bounds for
estimating the discrepancy distance from finite samples for different loss
functions. Using this distance, we derive novel generalization bounds for
domain adaptation for a wide family of loss functions. We also present a series
of novel adaptation bounds for large classes of regularization-based
algorithms, including support vector machines and kernel ridge regression based
on the empirical discrepancy. This motivates our analysis of the problem of
minimizing the empirical discrepancy for various loss functions for which we
also give novel algorithms. We report the results of preliminary experiments
that demonstrate the benefits of our discrepancy minimization algorithms for
domain adaptation.Comment: 12 pages, 4 figure
- âŠ