18 research outputs found
The Utility of Abstaining in Binary Classification
We explore the problem of binary classification in machine learning, with a
twist - the classifier is allowed to abstain on any datum, professing ignorance
about the true class label without committing to any prediction. This is
directly motivated by applications like medical diagnosis and fraud risk
assessment, in which incorrect predictions have potentially calamitous
consequences. We focus on a recent spate of theoretically driven work in this
area that characterizes how allowing abstentions can lead to fewer errors in
very general settings. Two areas are highlighted: the surprising possibility of
zero-error learning, and the fundamental tradeoff between predicting
sufficiently often and avoiding incorrect predictions. We review efficient
algorithms with provable guarantees for each of these areas. We also discuss
connections to other scenarios, notably active learning, as they suggest
promising directions of further inquiry in this emerging field.Comment: Short surve
PAC-Bayes Iterated Logarithm Bounds for Martingale Mixtures
We give tight concentration bounds for mixtures of martingales that are
simultaneously uniform over (a) mixture distributions, in a PAC-Bayes sense;
and (b) all finite times. These bounds are proved in terms of the martingale
variance, extending classical Bernstein inequalities, and sharpening and
simplifying prior work
Learning to Abstain from Binary Prediction
A binary classifier capable of abstaining from making a label prediction has
two goals in tension: minimizing errors, and avoiding abstaining unnecessarily
often. In this work, we exactly characterize the best achievable tradeoff
between these two goals in a general semi-supervised setting, given an ensemble
of predictors of varying competence as well as unlabeled data on which we wish
to predict or abstain. We give an algorithm for learning a classifier in this
setting which trades off its errors with abstentions in a minimax optimal
manner, is as efficient as linear learning and prediction, and is demonstrably
practical. Our analysis extends to a large class of loss functions and other
scenarios, including ensembles comprised of specialists that can themselves
abstain
Sharp Finite-Time Iterated-Logarithm Martingale Concentration
We give concentration bounds for martingales that are uniform over finite
times and extend classical Hoeffding and Bernstein inequalities. We also
demonstrate our concentration bounds to be optimal with a matching
anti-concentration inequality, proved using the same method. Together these
constitute a finite-time version of the law of the iterated logarithm, and shed
light on the relationship between it and the central limit theorem.Comment: 25 page
Linking Generative Adversarial Learning and Binary Classification
In this note, we point out a basic link between generative adversarial (GA)
training and binary classification -- any powerful discriminator essentially
computes an (f-)divergence between real and generated samples. The result,
repeatedly re-derived in decision theory, has implications for GA Networks
(GANs), providing an alternative perspective on training f-GANs by designing
the discriminator loss function
Sequential Nonparametric Testing with the Law of the Iterated Logarithm
We propose a new algorithmic framework for sequential hypothesis testing with
i.i.d. data, which includes A/B testing, nonparametric two-sample testing, and
independence testing as special cases. It is novel in several ways: (a) it
takes linear time and constant space to compute on the fly, (b) it has the same
power guarantee as a non-sequential version of the test with the same
computational constraints up to a small factor, and (c) it accesses only as
many samples as are required - its stopping time adapts to the unknown
difficulty of the problem. All our test statistics are constructed to be
zero-mean martingales under the null hypothesis, and the rejection threshold is
governed by a uniform non-asymptotic law of the iterated logarithm (LIL). For
the case of nonparametric two-sample mean testing, we also provide a finite
sample power analysis, and the first non-asymptotic stopping time calculations
for this class of problems. We verify our predictions for type I and II errors
and stopping times using simulations
Muffled Semi-Supervised Learning
We explore a novel approach to semi-supervised learning. This approach is
contrary to the common approach in that the unlabeled examples serve to
"muffle," rather than enhance, the guidance provided by the labeled examples.
We provide several variants of the basic algorithm and show experimentally that
they can achieve significantly higher AUC than boosted trees, random forests
and logistic regression when unlabeled examples are available
Optimally Combining Classifiers Using Unlabeled Data
We develop a worst-case analysis of aggregation of classifier ensembles for
binary classification. The task of predicting to minimize error is formulated
as a game played over a given set of unlabeled data (a transductive setting),
where prior label information is encoded as constraints on the game. The
minimax solution of this game identifies cases where a weighted combination of
the classifiers can perform significantly better than any single classifier
Scalable Semi-Supervised Aggregation of Classifiers
We present and empirically evaluate an efficient algorithm that learns to
aggregate the predictions of an ensemble of binary classifiers. The algorithm
uses the structure of the ensemble predictions on unlabeled data to yield
significant performance improvements. It does this without making assumptions
on the structure or origin of the ensemble, without parameters, and as scalably
as linear learning. We empirically demonstrate these performance gains with
random forests
Optimal Binary Classifier Aggregation for General Losses
We address the problem of aggregating an ensemble of predictors with known
loss bounds in a semi-supervised binary classification setting, to minimize
prediction loss incurred on the unlabeled data. We find the minimax optimal
predictions for a very general class of loss functions including all convex and
many non-convex losses, extending a recent analysis of the problem for
misclassification error. The result is a family of semi-supervised ensemble
aggregation algorithms which are as efficient as linear learning by convex
optimization, but are minimax optimal without any relaxations. Their decision
rules take a form familiar in decision theory -- applying sigmoid functions to
a notion of ensemble margin -- without the assumptions typically made in
margin-based learning.Comment: NIPS 201