308 research outputs found

    Evaluating and Understanding the Robustness of Adversarial Logit Pairing

    Full text link
    We evaluate the robustness of Adversarial Logit Pairing, a recently proposed defense against adversarial examples. We find that a network trained with Adversarial Logit Pairing achieves 0.6% accuracy in the threat model in which the defense is considered. We provide a brief overview of the defense and the threat models/claims considered, as well as a discussion of the methodology and results of our attack, which may offer insights into the reasons underlying the vulnerability of ALP to adversarial attack.Comment: NeurIPS SECML 2018. Source code at https://github.com/labsix/adversarial-logit-pairing-analysi

    Improving Adversarial Robustness via Attention and Adversarial Logit Pairing

    Full text link
    Though deep neural networks have achieved the state of the art performance in visual classification, recent studies have shown that they are all vulnerable to the attack of adversarial examples. In this paper, we develop improved techniques for defending against adversarial examples.First, we introduce enhanced defense using a technique we call \textbf{Attention and Adversarial Logit Pairing(AT+ALP)}, a method that encourages both attention map and logit for pairs of examples to be similar. When applied to clean examples and their adversarial counterparts, \textbf{AT+ALP} improves accuracy on adversarial examples over adversarial training.Next,We show that our \textbf{AT+ALP} can effectively increase the average activations of adversarial examples in the key area and demonstrate that it focuse on more discriminate features to improve the robustness of the model.Finally,we conducte extensive experiments using a wide range of datasets and the experiment results show that our \textbf{AT+ALP} achieves \textbf{the state of the art} defense.For example,on \textbf{17 Flower Category Database}, under strong 200-iteration \textbf{PGD} gray-box and black-box attacks where prior art has 34\% and 39\% accuracy, our method achieves \textbf{50\%} and \textbf{51\%}.Compared with previous work,our work is evaluated under highly challenging PGD attack:the maximum perturbation ϵ∈{0.25,0.5}\epsilon \in \{0.25,0.5\} i.e. L∞∈{0.25,0.5}L_\infty \in \{0.25,0.5\} with 10 to 200 attack iterations.To our knowledge, such a strong attack has not been previously explored on a wide range of datasets

    GanDef: A GAN based Adversarial Training Defense for Neural Network Classifier

    Full text link
    Machine learning models, especially neural network (NN) classifiers, are widely used in many applications including natural language processing, computer vision and cybersecurity. They provide high accuracy under the assumption of attack-free scenarios. However, this assumption has been defied by the introduction of adversarial examples -- carefully perturbed samples of input that are usually misclassified. Many researchers have tried to develop a defense against adversarial examples; however, we are still far from achieving that goal. In this paper, we design a Generative Adversarial Net (GAN) based adversarial training defense, dubbed GanDef, which utilizes a competition game to regulate the feature selection during the training. We analytically show that GanDef can train a classifier so it can defend against adversarial examples. Through extensive evaluation on different white-box adversarial examples, the classifier trained by GanDef shows the same level of test accuracy as those trained by state-of-the-art adversarial training defenses. More importantly, GanDef-Comb, a variant of GanDef, could utilize the discriminator to achieve a dynamic trade-off between correctly classifying original and adversarial examples. As a result, it achieves the highest overall test accuracy when the ratio of adversarial examples exceeds 41.7%

    Using Videos to Evaluate Image Model Robustness

    Full text link
    Human visual systems are robust to a wide range of image transformations that are challenging for artificial networks. We present the first study of image model robustness to the minute transformations found across video frames, which we term "natural robustness". Compared to previous studies on adversarial examples and synthetic distortions, natural robustness captures a more diverse set of common image transformations that occur in the natural environment. Our study across a dozen model architectures shows that more accurate models are more robust to natural transformations, and that robustness to synthetic color distortions is a good proxy for natural robustness. In examining brittleness in videos, we find that majority of the brittleness found in videos lies outside the typical definition of adversarial examples (99.9\%). Finally, we investigate training techniques to reduce brittleness and find that no single technique systematically improves natural robustness across twelve tested architectures.Comment: Video Robustness Dataset included in director

    Feature Prioritization and Regularization Improve Standard Accuracy and Adversarial Robustness

    Full text link
    Adversarial training has been successfully applied to build robust models at a certain cost. While the robustness of a model increases, the standard classification accuracy declines. This phenomenon is suggested to be an inherent trade-off. We propose a model that employs feature prioritization by a nonlinear attention module and L2L_2 feature regularization to improve the adversarial robustness and the standard accuracy relative to adversarial training. The attention module encourages the model to rely heavily on robust features by assigning larger weights to them while suppressing non-robust features. The regularizer encourages the model to extract similar features for the natural and adversarial images, effectively ignoring the added perturbation. In addition to evaluating the robustness of our model, we provide justification for the attention module and propose a novel experimental strategy that quantitatively demonstrates that our model is almost ideally aligned with salient data characteristics. Additional experimental results illustrate the power of our model relative to the state of the art methods.Comment: IJCAI 201

    Attribution-driven Causal Analysis for Detection of Adversarial Examples

    Full text link
    Attribution methods have been developed to explain the decision of a machine learning model on a given input. We use the Integrated Gradient method for finding attributions to define the causal neighborhood of an input by incrementally masking high attribution features. We study the robustness of machine learning models on benign and adversarial inputs in this neighborhood. Our study indicates that benign inputs are robust to the masking of high attribution features but adversarial inputs generated by the state-of-the-art adversarial attack methods such as DeepFool, FGSM, CW and PGD, are not robust to such masking. Further, our study demonstrates that this concentration of high-attribution features responsible for the incorrect decision is more pronounced in physically realizable adversarial examples. This difference in attribution of benign and adversarial inputs can be used to detect adversarial examples. Such a defense approach is independent of training data and attack method, and we demonstrate its effectiveness on digital and physically realizable perturbations.Comment: 11 pages, 6 figure

    Harnessing the Vulnerability of Latent Layers in Adversarially Trained Models

    Full text link
    Neural networks are vulnerable to adversarial attacks -- small visually imperceptible crafted noise which when added to the input drastically changes the output. The most effective method of defending against these adversarial attacks is to use the methodology of adversarial training. We analyze the adversarially trained robust models to study their vulnerability against adversarial attacks at the level of the latent layers. Our analysis reveals that contrary to the input layer which is robust to adversarial attack, the latent layer of these robust models are highly susceptible to adversarial perturbations of small magnitude. Leveraging this information, we introduce a new technique Latent Adversarial Training (LAT) which comprises of fine-tuning the adversarially trained models to ensure the robustness at the feature layers. We also propose Latent Attack (LA), a novel algorithm for construction of adversarial examples. LAT results in minor improvement in test accuracy and leads to a state-of-the-art adversarial accuracy against the universal first-order adversarial PGD attack which is shown for the MNIST, CIFAR-10, CIFAR-100 datasets.Comment: Accepted at IJCAI 201

    Interpreting Adversarial Robustness: A View from Decision Surface in Input Space

    Full text link
    One popular hypothesis of neural network generalization is that the flat local minima of loss surface in parameter space leads to good generalization. However, we demonstrate that loss surface in parameter space has no obvious relationship with generalization, especially under adversarial settings. Through visualizing decision surfaces in both parameter space and input space, we instead show that the geometry property of decision surface in input space correlates well with the adversarial robustness. We then propose an adversarial robustness indicator, which can evaluate a neural network's intrinsic robustness property without testing its accuracy under adversarial attacks. Guided by it, we further propose our robust training method. Without involving adversarial training, our method could enhance network's intrinsic adversarial robustness against various adversarial attacks.Comment: 15 pages, submitted to ICLR 201

    Improved Adversarial Robustness via Logit Regularization Methods

    Full text link
    While great progress has been made at making neural networks effective across a wide range of visual tasks, most models are surprisingly vulnerable. This frailness takes the form of small, carefully chosen perturbations of their input, known as adversarial examples, which represent a security threat for learned vision models in the wild -- a threat which should be responsibly defended against in safety-critical applications of computer vision. In this paper, we advocate for and experimentally investigate the use of a family of logit regularization techniques as an adversarial defense, which can be used in conjunction with other methods for creating adversarial robustness at little to no marginal cost. We also demonstrate that much of the effectiveness of one recent adversarial defense mechanism can in fact be attributed to logit regularization, and show how to improve its defense against both white-box and black-box attacks, in the process creating a stronger black-box attack against PGD-based models. We validate our methods on three datasets and include results on both gradient-free attacks and strong gradient-based iterative attacks with as many as 1,000 steps

    Using Pre-Training Can Improve Model Robustness and Uncertainty

    Full text link
    He et al. (2018) have called into question the utility of pre-training by showing that training from scratch can often yield similar performance to pre-training. We show that although pre-training may not improve performance on traditional classification metrics, it improves model robustness and uncertainty estimates. Through extensive experiments on adversarial examples, label corruption, class imbalance, out-of-distribution detection, and confidence calibration, we demonstrate large gains from pre-training and complementary effects with task-specific methods. We introduce adversarial pre-training and show approximately a 10% absolute improvement over the previous state-of-the-art in adversarial robustness. In some cases, using pre-training without task-specific methods also surpasses the state-of-the-art, highlighting the need for pre-training when evaluating future methods on robustness and uncertainty tasks.Comment: ICML 2019. PyTorch code here: https://github.com/hendrycks/pre-training Figure 3 update
    • …
    corecore