29 research outputs found
Implicitly Constrained Semi-Supervised Linear Discriminant Analysis
Semi-supervised learning is an important and active topic of research in
pattern recognition. For classification using linear discriminant analysis
specifically, several semi-supervised variants have been proposed. Using any
one of these methods is not guaranteed to outperform the supervised classifier
which does not take the additional unlabeled data into account. In this work we
compare traditional Expectation Maximization type approaches for
semi-supervised linear discriminant analysis with approaches based on intrinsic
constraints and propose a new principled approach for semi-supervised linear
discriminant analysis, using so-called implicit constraints. We explore the
relationships between these methods and consider the question if and in what
sense we can expect improvement in performance over the supervised procedure.
The constraint based approaches are more robust to misspecification of the
model, and may outperform alternatives that make more assumptions on the data,
in terms of the log-likelihood of unseen objects.Comment: 6 pages, 3 figures and 3 tables. International Conference on Pattern
Recognition (ICPR) 2014, Stockholm, Swede
Projected Estimators for Robust Semi-supervised Classification
For semi-supervised techniques to be applied safely in practice we at least
want methods to outperform their supervised counterparts. We study this
question for classification using the well-known quadratic surrogate loss
function. Using a projection of the supervised estimate onto a set of
constraints imposed by the unlabeled data, we find we can safely improve over
the supervised solution in terms of this quadratic loss. Unlike other
approaches to semi-supervised learning, the procedure does not rely on
assumptions that are not intrinsic to the classifier at hand. It is
theoretically demonstrated that, measured on the labeled and unlabeled training
data, this semi-supervised procedure never gives a lower quadratic loss than
the supervised alternative. To our knowledge this is the first approach that
offers such strong, albeit conservative, guarantees for improvement over the
supervised solution. The characteristics of our approach are explicated using
benchmark datasets to further understand the similarities and differences
between the quadratic loss criterion used in the theoretical results and the
classification accuracy often considered in practice.Comment: 13 pages, 2 figures, 1 tabl
Robust importance-weighted cross-validation under sample selection bias
Cross-validation under sample selection bias can, in principle, be done by importance-weighting the empirical risk. However, the importance-weighted risk estimator produces suboptimal hyperparameter estimates in problem settings where large weights arise with high probability. We study its sampling variance as a function of the training data distribution and introduce a control variate to increase its robustness to problematically large weights
Robust importance-weighted cross-validation under sample selection bias
Cross-validation under sample selection bias can, in principle, be done by importance-weighting the empirical risk. However, the importance-weighted risk estimator produces suboptimal hyperparameter estimates in problem settings where large weights arise with high probability. We study its sampling variance as a function of the training data distribution and introduce a control variate to increase its robustness to problematically large weights
A Brief Prehistory of Double Descent
In their thought-provoking paper [1], Belkin et al. illustrate and discuss
the shape of risk curves in the context of modern high-complexity learners.
Given a fixed training sample size , such curves show the risk of a learner
as a function of some (approximate) measure of its complexity . With the
number of features, these curves are also referred to as feature curves. A
salient observation in [1] is that these curves can display, what they call,
double descent: with increasing , the risk initially decreases, attains a
minimum, and then increases until equals , where the training data is
fitted perfectly. Increasing even further, the risk decreases a second and
final time, creating a peak at . This twofold descent may come as a
surprise, but as opposed to what [1] reports, it has not been overlooked
historically. Our letter draws attention to some original, earlier findings, of
interest to contemporary machine learning
Feature-level domain adaptation
Domain adaptation is the supervised learning setting in which the training
and test data are sampled from different distributions: training data is
sampled from a source domain, whilst test data is sampled from a target domain.
This paper proposes and studies an approach, called feature-level domain
adaptation (FLDA), that models the dependence between the two domains by means
of a feature-level transfer model that is trained to describe the transfer from
source to target domain. Subsequently, we train a domain-adapted classifier by
minimizing the expected loss under the resulting transfer model. For linear
classifiers and a large family of loss functions and transfer models, this
expected loss can be computed or approximated analytically, and minimized
efficiently. Our empirical evaluation of FLDA focuses on problems comprising
binary and count data in which the transfer can be naturally modeled via a
dropout distribution, which allows the classifier to adapt to differences in
the marginal probability of features in the source and the target domain. Our
experiments on several real-world problems show that FLDA performs on par with
state-of-the-art domain-adaptation techniques.Comment: 32 pages, 13 figures, 9 table