44,000 research outputs found
Efficient Two-Step Adversarial Defense for Deep Neural Networks
In recent years, deep neural networks have demonstrated outstanding
performance in many machine learning tasks. However, researchers have
discovered that these state-of-the-art models are vulnerable to adversarial
examples: legitimate examples added by small perturbations which are
unnoticeable to human eyes. Adversarial training, which augments the training
data with adversarial examples during the training process, is a well known
defense to improve the robustness of the model against adversarial attacks.
However, this robustness is only effective to the same attack method used for
adversarial training. Madry et al.(2017) suggest that effectiveness of
iterative multi-step adversarial attacks and particularly that projected
gradient descent (PGD) may be considered the universal first order adversary
and applying the adversarial training with PGD implies resistance against many
other first order attacks. However, the computational cost of the adversarial
training with PGD and other multi-step adversarial examples is much higher than
that of the adversarial training with other simpler attack techniques. In this
paper, we show how strong adversarial examples can be generated only at a cost
similar to that of two runs of the fast gradient sign method (FGSM), allowing
defense against adversarial attacks with a robustness level comparable to that
of the adversarial training with multi-step adversarial examples. We
empirically demonstrate the effectiveness of the proposed two-step defense
approach against different attack methods and its improvements over existing
defense strategies.Comment: 12 page
Pareto Adversarial Robustness: Balancing Spatial Robustness and Sensitivity-based Robustness
Adversarial robustness, which mainly contains sensitivity-based robustness
and spatial robustness, plays an integral part in the robust generalization. In
this paper, we endeavor to design strategies to achieve universal adversarial
robustness. To hit this target, we firstly investigate the less-studied spatial
robustness and then integrate existing spatial robustness methods by
incorporating both local and global spatial vulnerability into one spatial
attack and adversarial training. Based on this exploration, we further present
a comprehensive relationship between natural accuracy, sensitivity-based and
different spatial robustness, supported by the strong evidence from the
perspective of robust representation. More importantly, in order to balance
these mutual impacts of different robustness into one unified framework, we
incorporate \textit{Pareto criterion} into the adversarial robustness analysis,
yielding a novel strategy called \textit{Pareto Adversarial Training} towards
universal robustness. The resulting Pareto front, the set of optimal solutions,
provides the set of optimal balance among natural accuracy and different
adversarial robustness, shedding light on solutions towards universal
robustness in the future. To the best of our knowledge, we are the first to
consider the universal adversarial robustness via multi-objective optimization
Universal adversarial robustness of texture and shape-biased models
Increasing shape-bias in deep neural networks has been shown to improve robustness to common corruptions and noise. In this paper we analyze the adversarial robustness of texture and shape-biased models to Universal Adversarial Perturbations (UAPs). We use UAPs to evaluate the robustness of DNN models with varying degrees of shape-based training. We find that shape-biased models do not markedly improve adversarial robustness, and we show that ensembles of texture and shape-biased models can improve universal adversarial robustness while maintaining strong performance
On the existence of solutions to adversarial training in multiclass classification
We study three models of the problem of adversarial training in multiclass
classification designed to construct robust classifiers against adversarial
perturbations of data in the agnostic-classifier setting. We prove the
existence of Borel measurable robust classifiers in each model and provide a
unified perspective of the adversarial training problem, expanding the
connections with optimal transport initiated by the authors in previous work
and developing new connections between adversarial training in the multiclass
setting and total variation regularization. As a corollary of our results, we
prove the existence of Borel measurable solutions to the agnostic adversarial
training problem in the binary classification setting, a result that improves
results in the literature of adversarial training, where robust classifiers
were only known to exist within the enlarged universal -algebra of the
feature space
- …