785 research outputs found
Measuring Equality in Machine Learning Security Defenses
The machine learning security community has developed myriad defenses for
evasion attacks over the past decade. An understudied question in that
community is: for whom do these defenses defend? In this work, we consider some
common approaches to defending learned systems and whether those approaches may
offer unexpected performance inequities when used by different sub-populations.
We outline simple parity metrics and a framework for analysis that can begin to
answer this question through empirical results of the fairness implications of
machine learning security methods. Many methods have been proposed that can
cause direct harm, which we describe as biased vulnerability and biased
rejection. Our framework and metric can be applied to robustly trained models,
preprocessing-based methods, and rejection methods to capture behavior over
security budgets. We identify a realistic dataset with a reasonable
computational cost suitable for measuring the equality of defenses. Through a
case study in speech command recognition, we show how such defenses do not
offer equal protection for social subgroups and how to perform such analyses
for robustness training, and we present a comparison of fairness between two
rejection-based defenses: randomized smoothing and neural rejection. We offer
further analysis of factors that correlate to equitable defenses to stimulate
the future investigation of how to assist in building such defenses. To the
best of our knowledge, this is the first work that examines the fairness
disparity in the accuracy-robustness trade-off in speech data and addresses
fairness evaluation for rejection-based defenses.Comment: In Submissio
Boosting Randomized Smoothing with Variance Reduced Classifiers
Randomized Smoothing (RS) is a promising method for obtaining robustness
certificates by evaluating a base model under noise. In this work, we: (i)
theoretically motivate why ensembles are a particularly suitable choice as base
models for RS, and (ii) empirically confirm this choice, obtaining
state-of-the-art results in multiple settings. The key insight of our work is
that the reduced variance of ensembles over the perturbations introduced in RS
leads to significantly more consistent classifications for a given input. This,
in turn, leads to substantially increased certifiable radii for samples close
to the decision boundary. Additionally, we introduce key optimizations which
enable an up to 55-fold decrease in sample complexity of RS, thus drastically
reducing its computational overhead. Experimentally, we show that ensembles of
only 3 to 10 classifiers consistently improve on their strongest constituting
model with respect to their average certified radius (ACR) by 5% to 21% on both
CIFAR10 and ImageNet, achieving a new state-of-the-art ACR of 0.86 and 1.11,
respectively. We release all code and models required to reproduce our results
upon publication
- …