892 research outputs found
Attention, Please! Adversarial Defense via Attention Rectification and Preservation
This study provides a new understanding of the adversarial attack problem by
examining the correlation between adversarial attack and visual attention
change. In particular, we observed that: (1) images with incomplete attention
regions are more vulnerable to adversarial attacks; and (2) successful
adversarial attacks lead to deviated and scattered attention map. Accordingly,
an attention-based adversarial defense framework is designed to simultaneously
rectify the attention map for prediction and preserve the attention area
between adversarial and original images. The problem of adding iteratively
attacked samples is also discussed in the context of visual attention change.
We hope the attention-related data analysis and defense solution in this study
will shed some light on the mechanism behind the adversarial attack and also
facilitate future adversarial defense/attack model design
Stylized Adversarial Defense
Deep Convolution Neural Networks (CNNs) can easily be fooled by subtle,
imperceptible changes to the input images. To address this vulnerability,
adversarial training creates perturbation patterns and includes them in the
training set to robustify the model. In contrast to existing adversarial
training methods that only use class-boundary information (e.g., using a cross
entropy loss), we propose to exploit additional information from the feature
space to craft stronger adversaries that are in turn used to learn a robust
model. Specifically, we use the style and content information of the target
sample from another class, alongside its class boundary information to create
adversarial perturbations. We apply our proposed multi-task objective in a
deeply supervised manner, extracting multi-scale feature knowledge to create
maximally separating adversaries. Subsequently, we propose a max-margin
adversarial training approach that minimizes the distance between source image
and its adversary and maximizes the distance between the adversary and the
target image. Our adversarial training approach demonstrates strong robustness
compared to state of the art defenses, generalizes well to naturally occurring
corruptions and data distributional shifts, and retains the model accuracy on
clean examples.Comment: Code is available at this http https://github.com/Muzammal-Naseer/SA
Adversarial Vertex Mixup: Toward Better Adversarially Robust Generalization
Adversarial examples cause neural networks to produce incorrect outputs with
high confidence. Although adversarial training is one of the most effective
forms of defense against adversarial examples, unfortunately, a large gap
exists between test accuracy and training accuracy in adversarial training. In
this paper, we identify Adversarial Feature Overfitting (AFO), which may cause
poor adversarially robust generalization, and we show that adversarial training
can overshoot the optimal point in terms of robust generalization, leading to
AFO in our simple Gaussian model. Considering these theoretical results, we
present soft labeling as a solution to the AFO problem. Furthermore, we propose
Adversarial Vertex mixup (AVmixup), a soft-labeled data augmentation approach
for improving adversarially robust generalization. We complement our
theoretical analysis with experiments on CIFAR10, CIFAR100, SVHN, and Tiny
ImageNet, and show that AVmixup significantly improves the robust
generalization performance and that it reduces the trade-off between standard
accuracy and adversarial robustness.Comment: To appear in CVPR 2020 (Oral
Improving Model Robustness with Latent Distribution Locally and Globally
In this work, we consider model robustness of deep neural networks against
adversarial attacks from a global manifold perspective. Leveraging both the
local and global latent information, we propose a novel adversarial training
method through robust optimization, and a tractable way to generate Latent
Manifold Adversarial Examples (LMAEs) via an adversarial game between a
discriminator and a classifier. The proposed adversarial training with latent
distribution (ATLD) method defends against adversarial attacks by crafting
LMAEs with the latent manifold in an unsupervised manner. ATLD preserves the
local and global information of latent manifold and promises improved
robustness against adversarial attacks. To verify the effectiveness of our
proposed method, we conduct extensive experiments over different datasets
(e.g., CIFAR-10, CIFAR-100, SVHN) with different adversarial attacks (e.g.,
PGD, CW), and show that our method substantially outperforms the
state-of-the-art (e.g., Feature Scattering) in adversarial robustness by a
large accuracy margin. The source codes are available at
https://github.com/LitterQ/ATLD-pytorch
MIXPGD: Hybrid Adversarial Training for Speech Recognition Systems
Automatic speech recognition (ASR) systems based on deep neural networks are
weak against adversarial perturbations. We propose mixPGD adversarial training
method to improve the robustness of the model for ASR systems. In standard
adversarial training, adversarial samples are generated by leveraging
supervised or unsupervised methods. We merge the capabilities of both
supervised and unsupervised approaches in our method to generate new
adversarial samples which aid in improving model robustness. Extensive
experiments and comparison across various state-of-the-art defense methods and
adversarial attacks have been performed to show that mixPGD gains 4.1% WER of
better performance than previous best performing models under white-box
adversarial attack setting. We tested our proposed defense method against both
white-box and transfer based black-box attack settings to ensure that our
defense strategy is robust against various types of attacks. Empirical results
on several adversarial attacks validate the effectiveness of our proposed
approach
- …