22,070 research outputs found
Beyond Disagreement-based Agnostic Active Learning
We study agnostic active learning, where the goal is to learn a classifier in
a pre-specified hypothesis class interactively with as few label queries as
possible, while making no assumptions on the true function generating the
labels. The main algorithms for this problem are {\em{disagreement-based active
learning}}, which has a high label requirement, and {\em{margin-based active
learning}}, which only applies to fairly restricted settings. A major challenge
is to find an algorithm which achieves better label complexity, is consistent
in an agnostic setting, and applies to general classification problems.
In this paper, we provide such an algorithm. Our solution is based on two
novel contributions -- a reduction from consistent active learning to
confidence-rated prediction with guaranteed error, and a novel confidence-rated
predictor
Robust Interactive Learning
In this paper we propose and study a generalization of the standard
active-learning model where a more general type of query, class conditional
query, is allowed. Such queries have been quite useful in applications, but
have been lacking theoretical understanding. In this work, we characterize the
power of such queries under two well-known noise models. We give nearly tight
upper and lower bounds on the number of queries needed to learn both for the
general agnostic setting and for the bounded noise model. We further show that
our methods can be made adaptive to the (unknown) noise rate, with only
negligible loss in query complexity
Rates of convergence in active learning
We study the rates of convergence in generalization error achievable by
active learning under various types of label noise. Additionally, we study the
general problem of model selection for active learning with a nested hierarchy
of hypothesis classes and propose an algorithm whose error rate provably
converges to the best achievable error among classifiers in the hierarchy at a
rate adaptive to both the complexity of the optimal classifier and the noise
conditions. In particular, we state sufficient conditions for these rates to be
dramatically faster than those achievable by passive learning.Comment: Published in at http://dx.doi.org/10.1214/10-AOS843 the Annals of
Statistics (http://www.imstat.org/aos/) by the Institute of Mathematical
Statistics (http://www.imstat.org
Auditing: Active Learning with Outcome-Dependent Query Costs
We propose a learning setting in which unlabeled data is free, and the cost
of a label depends on its value, which is not known in advance. We study binary
classification in an extreme case, where the algorithm only pays for negative
labels. Our motivation are applications such as fraud detection, in which
investigating an honest transaction should be avoided if possible. We term the
setting auditing, and consider the auditing complexity of an algorithm: the
number of negative labels the algorithm requires in order to learn a hypothesis
with low relative error. We design auditing algorithms for simple hypothesis
classes (thresholds and rectangles), and show that with these algorithms, the
auditing complexity can be significantly lower than the active label
complexity. We also discuss a general competitive approach for auditing and
possible modifications to the framework.Comment: Corrections in section
- …