40 research outputs found
Curriculum Adversarial Training
Recently, deep learning has been applied to many security-sensitive
applications, such as facial authentication. The existence of adversarial
examples hinders such applications. The state-of-the-art result on defense
shows that adversarial training can be applied to train a robust model on MNIST
against adversarial examples; but it fails to achieve a high empirical
worst-case accuracy on a more complex task, such as CIFAR-10 and SVHN. In our
work, we propose curriculum adversarial training (CAT) to resolve this issue.
The basic idea is to develop a curriculum of adversarial examples generated by
attacks with a wide range of strengths. With two techniques to mitigate the
forgetting and the generalization issues, we demonstrate that CAT can improve
the prior art's empirical worst-case accuracy by a large margin of 25% on
CIFAR-10 and 35% on SVHN. At the same, the model's performance on
non-adversarial inputs is comparable to the state-of-the-art models.Comment: IJCAI 201
Defending Against Physically Realizable Attacks on Image Classification
We study the problem of defending deep neural network approaches for image
classification from physically realizable attacks. First, we demonstrate that
the two most scalable and effective methods for learning robust models,
adversarial training with PGD attacks and randomized smoothing, exhibit very
limited effectiveness against three of the highest profile physical attacks.
Next, we propose a new abstract adversarial model, rectangular occlusion
attacks, in which an adversary places a small adversarially crafted rectangle
in an image, and develop two approaches for efficiently computing the resulting
adversarial examples. Finally, we demonstrate that adversarial training using
our new attack yields image classification models that exhibit high robustness
against the physically realizable attacks we study, offering the first
effective generic defense against such attacks.Comment: camera-read
An Optimal Control View of Adversarial Machine Learning
I describe an optimal control view of adversarial machine learning, where the
dynamical system is the machine learner, the input are adversarial actions, and
the control costs are defined by the adversary's goals to do harm and be hard
to detect. This view encompasses many types of adversarial machine learning,
including test-item attacks, training-data poisoning, and adversarial reward
shaping. The view encourages adversarial machine learning researcher to utilize
advances in control theory and reinforcement learning
SAT: Improving Adversarial Training via Curriculum-Based Loss Smoothing
Adversarial training (AT) has become a popular choice for training robust
networks. However, it tends to sacrifice clean accuracy heavily in favor of
robustness and suffers from a large generalization error. To address these
concerns, we propose Smooth Adversarial Training (SAT), guided by our analysis
on the eigenspectrum of the loss Hessian. We find that curriculum learning, a
scheme that emphasizes on starting "easy" and gradually ramping up on the
"difficulty" of training, smooths the adversarial loss landscape for a suitably
chosen difficulty metric. We present a general formulation for curriculum
learning in the adversarial setting and propose two difficulty metrics based on
the maximal Hessian eigenvalue (H-SAT) and the softmax probability (P-SA). We
demonstrate that SAT stabilizes network training even for a large perturbation
norm and allows the network to operate at a better clean accuracy versus
robustness trade-off curve compared to AT. This leads to a significant
improvement in both clean accuracy and robustness compared to AT, TRADES, and
other baselines. To highlight a few results, our best model improves normal and
robust accuracy by 6% and 1% on CIFAR-100 compared to AT, respectively. On
Imagenette, a ten-class subset of ImageNet, our model outperforms AT by 23% and
3% on normal and robust accuracy respectively.Comment: Published at AISec '21: Proceedings of the 14th ACM Workshop on
Artificial Intelligence and Security. ACM DL link:
https://dl.acm.org/doi/abs/10.1145/3474369.348687
Recent Advances in Adversarial Training for Adversarial Robustness
Adversarial training is one of the most effective approaches defending
against adversarial examples for deep learning models. Unlike other defense
strategies, adversarial training aims to promote the robustness of models
intrinsically. During the last few years, adversarial training has been studied
and discussed from various aspects. A variety of improvements and developments
of adversarial training are proposed, which were, however, neglected in
existing surveys. For the first time in this survey, we systematically review
the recent progress on adversarial training for adversarial robustness with a
novel taxonomy. Then we discuss the generalization problems in adversarial
training from three perspectives. Finally, we highlight the challenges which
are not fully tackled and present potential future directions
Improving the affordability of robustness training for DNNs
Projected Gradient Descent (PGD) based adversarial training has become one of
the most prominent methods for building robust deep neural network models.
However, the computational complexity associated with this approach, due to the
maximization of the loss function when finding adversaries, is a longstanding
problem and may be prohibitive when using larger and more complex models. In
this paper we show that the initial phase of adversarial training is redundant
and can be replaced with natural training which significantly improves the
computational efficiency. We demonstrate that this efficiency gain can be
achieved without any loss in accuracy on natural and adversarial test samples.
We support our argument with insights on the nature of the adversaries and
their relative strength during the training process. We show that our proposed
method can reduce the training time by a factor of up to 2.5 with comparable or
better model test accuracy and generalization on various strengths of
adversarial attacks
Monge blunts Bayes: Hardness Results for Adversarial Training
The last few years have seen a staggering number of empirical studies of the
robustness of neural networks in a model of adversarial perturbations of their
inputs. Most rely on an adversary which carries out local modifications within
prescribed balls. None however has so far questioned the broader picture: how
to frame a resource-bounded adversary so that it can be severely detrimental to
learning, a non-trivial problem which entails at a minimum the choice of loss
and classifiers.
We suggest a formal answer for losses that satisfy the minimal statistical
requirement of being proper. We pin down a simple sufficient property for any
given class of adversaries to be detrimental to learning, involving a central
measure of "harmfulness" which generalizes the well-known class of integral
probability metrics. A key feature of our result is that it holds for all
proper losses, and for a popular subset of these, the optimisation of this
central measure appears to be independent of the loss. When classifiers are
Lipschitz -- a now popular approach in adversarial training --, this
optimisation resorts to optimal transport to make a low-budget compression of
class marginals. Toy experiments reveal a finding recently separately observed:
training against a sufficiently budgeted adversary of this kind improves
generalization
Bag of Tricks for Adversarial Training
Adversarial training (AT) is one of the most effective strategies for
promoting model robustness. However, recent benchmarks show that most of the
proposed improvements on AT are less effective than simply early stopping the
training procedure. This counter-intuitive fact motivates us to investigate the
implementation details of tens of AT methods. Surprisingly, we find that the
basic settings (e.g., weight decay, training schedule, etc.) used in these
methods are highly inconsistent. In this work, we provide comprehensive
evaluations on CIFAR-10, focusing on the effects of mostly overlooked training
tricks and hyperparameters for adversarially trained models. Our empirical
observations suggest that adversarial robustness is much more sensitive to some
basic training settings than we thought. For example, a slightly different
value of weight decay can reduce the model robust accuracy by more than 7%,
which is probable to override the potential promotion induced by the proposed
methods. We conclude a baseline training setting and re-implement previous
defenses to achieve new state-of-the-art results. These facts also appeal to
more concerns on the overlooked confounders when benchmarking defenses.Comment: ICLR 202
Towards Robust General Medical Image Segmentation
The reliability of Deep Learning systems depends on their accuracy but also
on their robustness against adversarial perturbations to the input data.
Several attacks and defenses have been proposed to improve the performance of
Deep Neural Networks under the presence of adversarial noise in the natural
image domain. However, robustness in computer-aided diagnosis for volumetric
data has only been explored for specific tasks and with limited attacks. We
propose a new framework to assess the robustness of general medical image
segmentation systems. Our contributions are two-fold: (i) we propose a new
benchmark to evaluate robustness in the context of the Medical Segmentation
Decathlon (MSD) by extending the recent AutoAttack natural image classification
framework to the domain of volumetric data segmentation, and (ii) we present a
novel lattice architecture for RObust Generic medical image segmentation (ROG).
Our results show that ROG is capable of generalizing across different tasks of
the MSD and largely surpasses the state-of-the-art under sophisticated
adversarial attacks.Comment: Accepted at MICCAI 202
Robust Face Verification via Disentangled Representations
We introduce a robust algorithm for face verification, i.e., deciding whether
twoimages are of the same person or not. Our approach is a novel take on the
idea ofusing deep generative networks for adversarial robustness. We use the
generativemodel during training as an online augmentation method instead of a
test-timepurifier that removes adversarial noise. Our architecture uses a
contrastive loss termand a disentangled generative model to sample negative
pairs. Instead of randomlypairing two real images, we pair an image with its
class-modified counterpart whilekeeping its content (pose, head tilt, hair,
etc.) intact. This enables us to efficientlysample hard negative pairs for the
contrastive loss. We experimentally show that, when coupled with adversarial
training, the proposed scheme converges with aweak inner solver and has a
higher clean and robust accuracy than state-of-the-art-methods when evaluated
against white-box physical attacks.Comment: Preprin