14 research outputs found
Le Cam meets LeCun: Deficiency and Generic Feature Learning
"Deep Learning" methods attempt to learn generic features in an unsupervised
fashion from a large unlabelled data set. These generic features should perform
as well as the best hand crafted features for any learning problem that makes
use of this data. We provide a definition of generic features, characterize
when it is possible to learn them and provide methods closely related to the
autoencoder and deep belief network of deep learning. In order to do so we use
the notion of deficiency and illustrate its value in studying certain general
learning problems.Comment: 25 pages, 2 figure
Learning with Symmetric Label Noise: The Importance of Being Unhinged
Convex potential minimisation is the de facto approach to binary
classification. However, Long and Servedio [2010] proved that under symmetric
label noise (SLN), minimisation of any convex potential over a linear function
class can result in classification performance equivalent to random guessing.
This ostensibly shows that convex losses are not SLN-robust. In this paper, we
propose a convex, classification-calibrated loss and prove that it is
SLN-robust. The loss avoids the Long and Servedio [2010] result by virtue of
being negatively unbounded. The loss is a modification of the hinge loss, where
one does not clamp at zero; hence, we call it the unhinged loss. We show that
the optimal unhinged solution is equivalent to that of a strongly regularised
SVM, and is the limiting solution for any convex potential; this implies that
strong l2 regularisation makes most standard learners SLN-robust. Experiments
confirm the SLN-robustness of the unhinged loss
Learning with Symmetric Label Noise: The Importance of Being Unhinged
Convex potential minimisation is the de facto approach to binary classification. However, Long and Servedio [2008] proved that under symmetric label noise (SLN), minimisation of any convex potential over a linear function class can result in classification performance equivalent to random guessing. This ostensibly shows that convex losses are not SLN-robust. In this paper, we propose a convex, classification-calibrated loss and prove that it is SLN-robust. The loss avoids the Long and Servedio [2008] result by virtue of being negatively unbounded. The loss is a modification of the hinge loss, where one does not clamp at zero; hence, we call it the unhinged loss. We show that the optimal unhinged solution is equivalent to that of a strongly regularised SVM, and is the limiting solution for any convex potential; this implies that strong l2 regularisation makes most standard learners SLN-robust. Experiments confirm the unhinged loss’ SLN-robustnes
Machine learning via transitions
This thesis presents a clear conceptual basis for theoretically studying machine learning
problems. Machine learning methods afford means to automate the discovery of
relationships in data sets. A relationship between quantities X and Y allows the prediction
of one quantity given information of the other. It is these relationships that
we make the central object of study. We call these relationships transitions.
A transition from a set X to a set Y is a function from X into the probability distributions
on Y. Beginning with this simple notion, the thesis proceeds as follows:
Utilizing tools from statistical decision theory, we develop an abstract language
for quantifying the information present in a transition.
We attack the problem of generalized supervision. Generalized supervision
is the learning of classifiers from non-ideal data. An important example of
this is the learning of classifiers from noisily labelled data. We demonstrate
the virtues of our abstract treatment by producing generic methods for solving
these problems, as well as producing generic upper bounds for our methods as
well as lower bounds for any method that attempts to solve these problems.
As a result of our study in generalized supervision, we produce means to define
procedures that are robust to certain forms of corruption. We explore, in detail,
procedures for learning classifiers that are robust to the effects of symmetric
label noise. The result is a classification algorithm that is easier to understand,
implement and parallelize than standard kernel based classification schemes,
such as the support vector machine and logistic regression. Furthermore, we
demonstrate the uniqueness of this method.
Finally, we show how many feature learning schemes can be understood via
our language. We present well motivated objectives for the task of learning features
from unlabelled data, before showing how many standard feature learning
methods (such as PCA, sparse coding, auto-encoders and so on) can be seen
as minimizing surrogates to our objective functions
Learning from Corrupted Binary Labels via Class-Probability Estimation
Abstract Many supervised learning problems involve learning from samples whose labels are corrupted in some way. For example, each label may be flipped with some constant probability (learning with label noise), or one may have a pool of unlabelled samples in lieu of negative samples (learning from positive and unlabelled data). This paper uses class-probability estimation to study these and other corruption processes belonging to the mutually contaminated distributions framework Learning from corrupted binary labels In many practical scenarios involving learning from binary labels, one observes samples whose labels are corrupted versions of the actual ground truth. For example, in learning from class-conditional label noise (CCN learning), the labels are flipped with some constant probability A fundamental question is whether one can minimise a given performance measure with respect to D, given access only to samples from D corr . Intuitively, in general this requires knowledge of the parameters of the corruption process that determines D corr . This yields two further questions: are there measures for which knowledge of these corruption parameters is unnecessary, and for other measures, can we estimate these parameters? In this paper, we consider corruption problems belonging to the mutually contaminated distributions framework While some of our results are known for the special cases of CCN and PU learning, our interest is in determining to what extent they generalise to other label corruption problems. This is a step towards a unified treatment of these problems. We now fix notation and formalise the problem