3,919 research outputs found
Efficient Two-Step Adversarial Defense for Deep Neural Networks
In recent years, deep neural networks have demonstrated outstanding
performance in many machine learning tasks. However, researchers have
discovered that these state-of-the-art models are vulnerable to adversarial
examples: legitimate examples added by small perturbations which are
unnoticeable to human eyes. Adversarial training, which augments the training
data with adversarial examples during the training process, is a well known
defense to improve the robustness of the model against adversarial attacks.
However, this robustness is only effective to the same attack method used for
adversarial training. Madry et al.(2017) suggest that effectiveness of
iterative multi-step adversarial attacks and particularly that projected
gradient descent (PGD) may be considered the universal first order adversary
and applying the adversarial training with PGD implies resistance against many
other first order attacks. However, the computational cost of the adversarial
training with PGD and other multi-step adversarial examples is much higher than
that of the adversarial training with other simpler attack techniques. In this
paper, we show how strong adversarial examples can be generated only at a cost
similar to that of two runs of the fast gradient sign method (FGSM), allowing
defense against adversarial attacks with a robustness level comparable to that
of the adversarial training with multi-step adversarial examples. We
empirically demonstrate the effectiveness of the proposed two-step defense
approach against different attack methods and its improvements over existing
defense strategies.Comment: 12 page
The Odds are Odd: A Statistical Test for Detecting Adversarial Examples
We investigate conditions under which test statistics exist that can reliably
detect examples, which have been adversarially manipulated in a white-box
attack. These statistics can be easily computed and calibrated by randomly
corrupting inputs. They exploit certain anomalies that adversarial attacks
introduce, in particular if they follow the paradigm of choosing perturbations
optimally under p-norm constraints. Access to the log-odds is the only
requirement to defend models. We justify our approach empirically, but also
provide conditions under which detectability via the suggested test statistics
is guaranteed to be effective. In our experiments, we show that it is even
possible to correct test time predictions for adversarial attacks with high
accuracy
- …