4 research outputs found
Using Single-Step Adversarial Training to Defend Iterative Adversarial Examples
Adversarial examples have become one of the largest challenges that machine
learning models, especially neural network classifiers, face. These adversarial
examples break the assumption of attack-free scenario and fool state-of-the-art
(SOTA) classifiers with insignificant perturbations to human. So far,
researchers achieved great progress in utilizing adversarial training as a
defense. However, the overwhelming computational cost degrades its
applicability and little has been done to overcome this issue. Single-Step
adversarial training methods have been proposed as computationally viable
solutions, however they still fail to defend against iterative adversarial
examples. In this work, we first experimentally analyze several different SOTA
defense methods against adversarial examples. Then, based on observations from
experiments, we propose a novel single-step adversarial training method which
can defend against both single-step and iterative adversarial examples. Lastly,
through extensive evaluations, we demonstrate that our proposed method
outperforms the SOTA single-step and iterative adversarial training defense.
Compared with ATDA (single-step method) on CIFAR10 dataset, our proposed method
achieves 35.67% enhancement in test accuracy and 19.14% reduction in training
time. When compared with methods that use BIM or Madry examples (iterative
methods) on CIFAR10 dataset, it saves up to 76.03% in training time with less
than 3.78% degeneration in test accuracy
Bag of Tricks for Adversarial Training
Adversarial training (AT) is one of the most effective strategies for
promoting model robustness. However, recent benchmarks show that most of the
proposed improvements on AT are less effective than simply early stopping the
training procedure. This counter-intuitive fact motivates us to investigate the
implementation details of tens of AT methods. Surprisingly, we find that the
basic settings (e.g., weight decay, training schedule, etc.) used in these
methods are highly inconsistent. In this work, we provide comprehensive
evaluations on CIFAR-10, focusing on the effects of mostly overlooked training
tricks and hyperparameters for adversarially trained models. Our empirical
observations suggest that adversarial robustness is much more sensitive to some
basic training settings than we thought. For example, a slightly different
value of weight decay can reduce the model robust accuracy by more than 7%,
which is probable to override the potential promotion induced by the proposed
methods. We conclude a baseline training setting and re-implement previous
defenses to achieve new state-of-the-art results. These facts also appeal to
more concerns on the overlooked confounders when benchmarking defenses.Comment: ICLR 202
Boosting Adversarial Training with Hypersphere Embedding
Adversarial training (AT) is one of the most effective defenses against
adversarial attacks for deep learning models. In this work, we advocate
incorporating the hypersphere embedding (HE) mechanism into the AT procedure by
regularizing the features onto compact manifolds, which constitutes a
lightweight yet effective module to blend in the strength of representation
learning. Our extensive analyses reveal that AT and HE are well coupled to
benefit the robustness of the adversarially trained models from several
aspects. We validate the effectiveness and adaptability of HE by embedding it
into the popular AT frameworks including PGD-AT, ALP, and TRADES, as well as
the FreeAT and FastAT strategies. In the experiments, we evaluate our methods
under a wide range of adversarial attacks on the CIFAR-10 and ImageNet
datasets, which verifies that integrating HE can consistently enhance the model
robustness for each AT framework with little extra computation.Comment: NeurIPS 202
Two Coupled Rejection Metrics Can Tell Adversarial Examples Apart
Correctly classifying adversarial examples is an essential but challenging
requirement for safely deploying machine learning models. As reported in
RobustBench, even the state-of-the-art adversarially trained models struggle to
exceed 67% robust test accuracy on CIFAR-10, which is far from practical. A
complementary way towards robustness is to introduce a rejection option,
allowing the model to not return predictions on uncertain inputs, where
confidence is a commonly used certainty proxy. Along with this routine, we find
that confidence and a rectified confidence (R-Con) can form two coupled
rejection metrics, which could provably distinguish wrongly classified inputs
from correctly classified ones. This intriguing property sheds light on using
coupling strategies to better detect and reject adversarial examples. We
evaluate our rectified rejection (RR) module on CIFAR-10, CIFAR-10-C, and
CIFAR-100 under several attacks including adaptive ones, and demonstrate that
the RR module is compatible with different adversarial training frameworks on
improving robustness, with little extra computation. The code is available at
https://github.com/P2333/Rectified-Rejection