381 research outputs found
Adversarial Training for Free!
Adversarial training, in which a network is trained on adversarial examples,
is one of the few defenses against adversarial attacks that withstands strong
attacks. Unfortunately, the high cost of generating strong adversarial examples
makes standard adversarial training impractical on large-scale problems like
ImageNet. We present an algorithm that eliminates the overhead cost of
generating adversarial examples by recycling the gradient information computed
when updating model parameters. Our "free" adversarial training algorithm
achieves comparable robustness to PGD adversarial training on the CIFAR-10 and
CIFAR-100 datasets at negligible additional cost compared to natural training,
and can be 7 to 30 times faster than other strong adversarial training methods.
Using a single workstation with 4 P100 GPUs and 2 days of runtime, we can train
a robust model for the large-scale ImageNet classification task that maintains
40% accuracy against PGD attacks. The code is available at
https://github.com/ashafahi/free_adv_train.Comment: Accepted to NeurIPS 201
Certifiable Black-Box Attack: Ensuring Provably Successful Attack for Adversarial Examples
Black-box adversarial attacks have shown strong potential to subvert machine
learning models. Existing black-box adversarial attacks craft the adversarial
examples by iteratively querying the target model and/or leveraging the
transferability of a local surrogate model. Whether such attack can succeed
remains unknown to the adversary when empirically designing the attack. In this
paper, to our best knowledge, we take the first step to study a new paradigm of
adversarial attacks -- certifiable black-box attack that can guarantee the
attack success rate of the crafted adversarial examples. Specifically, we
revise the randomized smoothing to establish novel theories for ensuring the
attack success rate of the adversarial examples. To craft the adversarial
examples with the certifiable attack success rate (CASR) guarantee, we design
several novel techniques, including a randomized query method to query the
target model, an initialization method with smoothed self-supervised
perturbation to derive certifiable adversarial examples, and a geometric
shifting method to reduce the perturbation size of the certifiable adversarial
examples for better imperceptibility. We have comprehensively evaluated the
performance of the certifiable black-box attack on CIFAR10 and ImageNet
datasets against different levels of defenses. Both theoretical and
experimental results have validated the effectiveness of the proposed
certifiable attack
- …