308 research outputs found
Evaluating and Understanding the Robustness of Adversarial Logit Pairing
We evaluate the robustness of Adversarial Logit Pairing, a recently proposed
defense against adversarial examples. We find that a network trained with
Adversarial Logit Pairing achieves 0.6% accuracy in the threat model in which
the defense is considered. We provide a brief overview of the defense and the
threat models/claims considered, as well as a discussion of the methodology and
results of our attack, which may offer insights into the reasons underlying the
vulnerability of ALP to adversarial attack.Comment: NeurIPS SECML 2018. Source code at
https://github.com/labsix/adversarial-logit-pairing-analysi
Improving Adversarial Robustness via Attention and Adversarial Logit Pairing
Though deep neural networks have achieved the state of the art performance in
visual classification, recent studies have shown that they are all vulnerable
to the attack of adversarial examples. In this paper, we develop improved
techniques for defending against adversarial examples.First, we introduce
enhanced defense using a technique we call \textbf{Attention and Adversarial
Logit Pairing(AT+ALP)}, a method that encourages both attention map and logit
for pairs of examples to be similar. When applied to clean examples and their
adversarial counterparts, \textbf{AT+ALP} improves accuracy on adversarial
examples over adversarial training.Next,We show that our \textbf{AT+ALP} can
effectively increase the average activations of adversarial examples in the key
area and demonstrate that it focuse on more discriminate features to improve
the robustness of the model.Finally,we conducte extensive experiments using a
wide range of datasets and the experiment results show that our \textbf{AT+ALP}
achieves \textbf{the state of the art} defense.For example,on \textbf{17 Flower
Category Database}, under strong 200-iteration \textbf{PGD} gray-box and
black-box attacks where prior art has 34\% and 39\% accuracy, our method
achieves \textbf{50\%} and \textbf{51\%}.Compared with previous work,our work
is evaluated under highly challenging PGD attack:the maximum perturbation
i.e. with 10 to 200
attack iterations.To our knowledge, such a strong attack has not been
previously explored on a wide range of datasets
GanDef: A GAN based Adversarial Training Defense for Neural Network Classifier
Machine learning models, especially neural network (NN) classifiers, are
widely used in many applications including natural language processing,
computer vision and cybersecurity. They provide high accuracy under the
assumption of attack-free scenarios. However, this assumption has been defied
by the introduction of adversarial examples -- carefully perturbed samples of
input that are usually misclassified. Many researchers have tried to develop a
defense against adversarial examples; however, we are still far from achieving
that goal. In this paper, we design a Generative Adversarial Net (GAN) based
adversarial training defense, dubbed GanDef, which utilizes a competition game
to regulate the feature selection during the training. We analytically show
that GanDef can train a classifier so it can defend against adversarial
examples. Through extensive evaluation on different white-box adversarial
examples, the classifier trained by GanDef shows the same level of test
accuracy as those trained by state-of-the-art adversarial training defenses.
More importantly, GanDef-Comb, a variant of GanDef, could utilize the
discriminator to achieve a dynamic trade-off between correctly classifying
original and adversarial examples. As a result, it achieves the highest overall
test accuracy when the ratio of adversarial examples exceeds 41.7%
Using Videos to Evaluate Image Model Robustness
Human visual systems are robust to a wide range of image transformations that
are challenging for artificial networks. We present the first study of image
model robustness to the minute transformations found across video frames, which
we term "natural robustness". Compared to previous studies on adversarial
examples and synthetic distortions, natural robustness captures a more diverse
set of common image transformations that occur in the natural environment. Our
study across a dozen model architectures shows that more accurate models are
more robust to natural transformations, and that robustness to synthetic color
distortions is a good proxy for natural robustness. In examining brittleness in
videos, we find that majority of the brittleness found in videos lies outside
the typical definition of adversarial examples (99.9\%). Finally, we
investigate training techniques to reduce brittleness and find that no single
technique systematically improves natural robustness across twelve tested
architectures.Comment: Video Robustness Dataset included in director
Feature Prioritization and Regularization Improve Standard Accuracy and Adversarial Robustness
Adversarial training has been successfully applied to build robust models at
a certain cost. While the robustness of a model increases, the standard
classification accuracy declines. This phenomenon is suggested to be an
inherent trade-off. We propose a model that employs feature prioritization by a
nonlinear attention module and feature regularization to improve the
adversarial robustness and the standard accuracy relative to adversarial
training. The attention module encourages the model to rely heavily on robust
features by assigning larger weights to them while suppressing non-robust
features. The regularizer encourages the model to extract similar features for
the natural and adversarial images, effectively ignoring the added
perturbation. In addition to evaluating the robustness of our model, we provide
justification for the attention module and propose a novel experimental
strategy that quantitatively demonstrates that our model is almost ideally
aligned with salient data characteristics. Additional experimental results
illustrate the power of our model relative to the state of the art methods.Comment: IJCAI 201
Attribution-driven Causal Analysis for Detection of Adversarial Examples
Attribution methods have been developed to explain the decision of a machine
learning model on a given input. We use the Integrated Gradient method for
finding attributions to define the causal neighborhood of an input by
incrementally masking high attribution features. We study the robustness of
machine learning models on benign and adversarial inputs in this neighborhood.
Our study indicates that benign inputs are robust to the masking of high
attribution features but adversarial inputs generated by the state-of-the-art
adversarial attack methods such as DeepFool, FGSM, CW and PGD, are not robust
to such masking. Further, our study demonstrates that this concentration of
high-attribution features responsible for the incorrect decision is more
pronounced in physically realizable adversarial examples. This difference in
attribution of benign and adversarial inputs can be used to detect adversarial
examples. Such a defense approach is independent of training data and attack
method, and we demonstrate its effectiveness on digital and physically
realizable perturbations.Comment: 11 pages, 6 figure
Harnessing the Vulnerability of Latent Layers in Adversarially Trained Models
Neural networks are vulnerable to adversarial attacks -- small visually
imperceptible crafted noise which when added to the input drastically changes
the output. The most effective method of defending against these adversarial
attacks is to use the methodology of adversarial training. We analyze the
adversarially trained robust models to study their vulnerability against
adversarial attacks at the level of the latent layers. Our analysis reveals
that contrary to the input layer which is robust to adversarial attack, the
latent layer of these robust models are highly susceptible to adversarial
perturbations of small magnitude. Leveraging this information, we introduce a
new technique Latent Adversarial Training (LAT) which comprises of fine-tuning
the adversarially trained models to ensure the robustness at the feature
layers. We also propose Latent Attack (LA), a novel algorithm for construction
of adversarial examples. LAT results in minor improvement in test accuracy and
leads to a state-of-the-art adversarial accuracy against the universal
first-order adversarial PGD attack which is shown for the MNIST, CIFAR-10,
CIFAR-100 datasets.Comment: Accepted at IJCAI 201
Interpreting Adversarial Robustness: A View from Decision Surface in Input Space
One popular hypothesis of neural network generalization is that the flat
local minima of loss surface in parameter space leads to good generalization.
However, we demonstrate that loss surface in parameter space has no obvious
relationship with generalization, especially under adversarial settings.
Through visualizing decision surfaces in both parameter space and input space,
we instead show that the geometry property of decision surface in input space
correlates well with the adversarial robustness. We then propose an adversarial
robustness indicator, which can evaluate a neural network's intrinsic
robustness property without testing its accuracy under adversarial attacks.
Guided by it, we further propose our robust training method. Without involving
adversarial training, our method could enhance network's intrinsic adversarial
robustness against various adversarial attacks.Comment: 15 pages, submitted to ICLR 201
Improved Adversarial Robustness via Logit Regularization Methods
While great progress has been made at making neural networks effective across
a wide range of visual tasks, most models are surprisingly vulnerable. This
frailness takes the form of small, carefully chosen perturbations of their
input, known as adversarial examples, which represent a security threat for
learned vision models in the wild -- a threat which should be responsibly
defended against in safety-critical applications of computer vision. In this
paper, we advocate for and experimentally investigate the use of a family of
logit regularization techniques as an adversarial defense, which can be used in
conjunction with other methods for creating adversarial robustness at little to
no marginal cost. We also demonstrate that much of the effectiveness of one
recent adversarial defense mechanism can in fact be attributed to logit
regularization, and show how to improve its defense against both white-box and
black-box attacks, in the process creating a stronger black-box attack against
PGD-based models. We validate our methods on three datasets and include results
on both gradient-free attacks and strong gradient-based iterative attacks with
as many as 1,000 steps
Using Pre-Training Can Improve Model Robustness and Uncertainty
He et al. (2018) have called into question the utility of pre-training by
showing that training from scratch can often yield similar performance to
pre-training. We show that although pre-training may not improve performance on
traditional classification metrics, it improves model robustness and
uncertainty estimates. Through extensive experiments on adversarial examples,
label corruption, class imbalance, out-of-distribution detection, and
confidence calibration, we demonstrate large gains from pre-training and
complementary effects with task-specific methods. We introduce adversarial
pre-training and show approximately a 10% absolute improvement over the
previous state-of-the-art in adversarial robustness. In some cases, using
pre-training without task-specific methods also surpasses the state-of-the-art,
highlighting the need for pre-training when evaluating future methods on
robustness and uncertainty tasks.Comment: ICML 2019. PyTorch code here:
https://github.com/hendrycks/pre-training Figure 3 update
- …