2 research outputs found
Classifier Risk Estimation under Limited Labeling Resources
In this paper we propose strategies for estimating performance of a
classifier when labels cannot be obtained for the whole test set. The number of
test instances which can be labeled is very small compared to the whole test
data size. The goal then is to obtain a precise estimate of classifier
performance using as little labeling resource as possible. Specifically, we try
to answer, how to select a subset of the large test set for labeling such that
the performance of a classifier estimated on this subset is as close as
possible to the one on the whole test set. We propose strategies based on
stratified sampling for selecting this subset. We show that these strategies
can reduce the variance in estimation of classifier accuracy by a significant
amount compared to simple random sampling (over 65% in several cases). Hence,
our proposed methods are much more precise compared to random sampling for
accuracy estimation under restricted labeling resources. The reduction in
number of samples required (compared to random sampling) to estimate the
classifier accuracy with only 1% error is high as 60% in some cases.Comment: PAKDD 201
Active Testing: Sample-Efficient Model Evaluation
We introduce a new framework for sample-efficient model evaluation that we
call active testing. While approaches like active learning reduce the number of
labels needed for model training, existing literature largely ignores the cost
of labeling test data, typically unrealistically assuming large test sets for
model evaluation. This creates a disconnect to real applications, where test
labels are important and just as expensive, e.g. for optimizing
hyperparameters. Active testing addresses this by carefully selecting the test
points to label, ensuring model evaluation is sample-efficient. To this end, we
derive theoretically-grounded and intuitive acquisition strategies that are
specifically tailored to the goals of active testing, noting these are distinct
to those of active learning. As actively selecting labels introduces a bias; we
further show how to remove this bias while reducing the variance of the
estimator at the same time. Active testing is easy to implement and can be
applied to any supervised machine learning method. We demonstrate its
effectiveness on models including WideResNets and Gaussian processes on
datasets including Fashion-MNIST and CIFAR-100.Comment: Published at the 38th International Conference on Machine Learning
(ICML 2021