12 research outputs found
Fast rates for a kNN classifier robust to unknown asymmetric label noise
We consider classification in the presence of class-dependent asymmetric
label noise with unknown noise probabilities. In this setting, identifiability
conditions are known, but additional assumptions were shown to be required for
finite sample rates, and so far only the parametric rate has been obtained.
Assuming these identifiability conditions, together with a measure-smoothness
condition on the regression function and Tsybakov's margin condition, we show
that the Robust kNN classifier of Gao et al. attains, the minimax optimal rates
of the noise-free setting, up to a log factor, even when trained on data with
unknown asymmetric label noise. Hence, our results provide a solid theoretical
backing for this empirically successful algorithm. By contrast the standard kNN
is not even consistent in the setting of asymmetric label noise. A key idea in
our analysis is a simple kNN based method for estimating the maximum of a
function that requires far less assumptions than existing mode estimators do,
and which may be of independent interest for noise proportion estimation and
randomised optimisation problems.Comment: ICML 201
Certified Robustness of Nearest Neighbors against Data Poisoning and Backdoor Attacks
Data poisoning attacks and backdoor attacks aim to corrupt a machine learning
classifier via modifying, adding, and/or removing some carefully selected
training examples, such that the corrupted classifier makes incorrect
predictions as the attacker desires. The key idea of state-of-the-art certified
defenses against data poisoning attacks and backdoor attacks is to create a
majority vote mechanism to predict the label of a testing example. Moreover,
each voter is a base classifier trained on a subset of the training dataset.
Classical simple learning algorithms such as k nearest neighbors (kNN) and
radius nearest neighbors (rNN) have intrinsic majority vote mechanisms. In this
work, we show that the intrinsic majority vote mechanisms in kNN and rNN
already provide certified robustness guarantees against data poisoning attacks
and backdoor attacks. Moreover, our evaluation results on MNIST and CIFAR10
show that the intrinsic certified robustness guarantees of kNN and rNN
outperform those provided by state-of-the-art certified defenses. Our results
serve as standard baselines for future certified defenses against data
poisoning attacks and backdoor attacks.Comment: To appear in AAAI Conference on Artificial Intelligence, 202
Classification with unknown class-conditional label noise on non-compact feature spaces
We investigate the problem of classification in the presence of unknown
class-conditional label noise in which the labels observed by the learner have
been corrupted with some unknown class dependent probability. In order to
obtain finite sample rates, previous approaches to classification with unknown
class-conditional label noise have required that the regression function is
close to its extrema on sets of large measure. We shall consider this problem
in the setting of non-compact metric spaces, where the regression function need
not attain its extrema.
In this setting we determine the minimax optimal learning rates (up to
logarithmic factors). The rate displays interesting threshold behaviour: When
the regression function approaches its extrema at a sufficient rate, the
optimal learning rates are of the same order as those obtained in the
label-noise free setting. If the regression function approaches its extrema
more gradually then classification performance necessarily degrades. In
addition, we present an adaptive algorithm which attains these rates without
prior knowledge of either the distributional parameters or the local density.
This identifies for the first time a scenario in which finite sample rates are
achievable in the label noise setting, but they differ from the optimal rates
without label noise