2,114 research outputs found
Defense against Universal Adversarial Perturbations
Recent advances in Deep Learning show the existence of image-agnostic
quasi-imperceptible perturbations that when applied to `any' image can fool a
state-of-the-art network classifier to change its prediction about the image
label. These `Universal Adversarial Perturbations' pose a serious threat to the
success of Deep Learning in practice. We present the first dedicated framework
to effectively defend the networks against such perturbations. Our approach
learns a Perturbation Rectifying Network (PRN) as `pre-input' layers to a
targeted model, such that the targeted model needs no modification. The PRN is
learned from real and synthetic image-agnostic perturbations, where an
efficient method to compute the latter is also proposed. A perturbation
detector is separately trained on the Discrete Cosine Transform of the
input-output difference of the PRN. A query image is first passed through the
PRN and verified by the detector. If a perturbation is detected, the output of
the PRN is used for label prediction instead of the actual image. A rigorous
evaluation shows that our framework can defend the network classifiers against
unseen adversarial perturbations in the real-world scenarios with up to 97.5%
success rate. The PRN also generalizes well in the sense that training for one
targeted network defends another network with a comparable success rate.Comment: Accepted in IEEE CVPR 201
Universal Adversarial Defense in Remote Sensing Based on Pre-trained Denoising Diffusion Models
Deep neural networks (DNNs) have achieved tremendous success in many remote
sensing (RS) applications, in which DNNs are vulnerable to adversarial
perturbations. Unfortunately, current adversarial defense approaches in RS
studies usually suffer from performance fluctuation and unnecessary re-training
costs due to the need for prior knowledge of the adversarial perturbations
among RS data. To circumvent these challenges, we propose a universal
adversarial defense approach in RS imagery (UAD-RS) using pre-trained diffusion
models to defend the common DNNs against multiple unknown adversarial attacks.
Specifically, the generative diffusion models are first pre-trained on
different RS datasets to learn generalized representations in various data
domains. After that, a universal adversarial purification framework is
developed using the forward and reverse process of the pre-trained diffusion
models to purify the perturbations from adversarial samples. Furthermore, an
adaptive noise level selection (ANLS) mechanism is built to capture the optimal
noise level of the diffusion model that can achieve the best purification
results closest to the clean samples according to their Frechet Inception
Distance (FID) in deep feature space. As a result, only a single pre-trained
diffusion model is needed for the universal purification of adversarial samples
on each dataset, which significantly alleviates the re-training efforts and
maintains high performance without prior knowledge of the adversarial
perturbations. Experiments on four heterogeneous RS datasets regarding scene
classification and semantic segmentation verify that UAD-RS outperforms
state-of-the-art adversarial purification approaches with a universal defense
against seven commonly existing adversarial perturbations. Codes and the
pre-trained models are available online (https://github.com/EricYu97/UAD-RS).Comment: Added the GitHub link to the abstrac
Efficient Two-Step Adversarial Defense for Deep Neural Networks
In recent years, deep neural networks have demonstrated outstanding
performance in many machine learning tasks. However, researchers have
discovered that these state-of-the-art models are vulnerable to adversarial
examples: legitimate examples added by small perturbations which are
unnoticeable to human eyes. Adversarial training, which augments the training
data with adversarial examples during the training process, is a well known
defense to improve the robustness of the model against adversarial attacks.
However, this robustness is only effective to the same attack method used for
adversarial training. Madry et al.(2017) suggest that effectiveness of
iterative multi-step adversarial attacks and particularly that projected
gradient descent (PGD) may be considered the universal first order adversary
and applying the adversarial training with PGD implies resistance against many
other first order attacks. However, the computational cost of the adversarial
training with PGD and other multi-step adversarial examples is much higher than
that of the adversarial training with other simpler attack techniques. In this
paper, we show how strong adversarial examples can be generated only at a cost
similar to that of two runs of the fast gradient sign method (FGSM), allowing
defense against adversarial attacks with a robustness level comparable to that
of the adversarial training with multi-step adversarial examples. We
empirically demonstrate the effectiveness of the proposed two-step defense
approach against different attack methods and its improvements over existing
defense strategies.Comment: 12 page
Regional Homogeneity: Towards Learning Transferable Universal Adversarial Perturbations Against Defenses
This paper focuses on learning transferable adversarial examples specifically
against defense models (models to defense adversarial attacks). In particular,
we show that a simple universal perturbation can fool a series of
state-of-the-art defenses.
Adversarial examples generated by existing attacks are generally hard to
transfer to defense models. We observe the property of regional homogeneity in
adversarial perturbations and suggest that the defenses are less robust to
regionally homogeneous perturbations. Therefore, we propose an effective
transforming paradigm and a customized gradient transformer module to transform
existing perturbations into regionally homogeneous ones. Without explicitly
forcing the perturbations to be universal, we observe that a well-trained
gradient transformer module tends to output input-independent gradients (hence
universal) benefiting from the under-fitting phenomenon. Thorough experiments
demonstrate that our work significantly outperforms the prior art attacking
algorithms (either image-dependent or universal ones) by an average improvement
of 14.0% when attacking 9 defenses in the black-box setting. In addition to the
cross-model transferability, we also verify that regionally homogeneous
perturbations can well transfer across different vision tasks (attacking with
the semantic segmentation task and testing on the object detection task).Comment: The code is available here:
https://github.com/LiYingwei/Regional-Homogeneit
Universal adversarial perturbations for multiple classification tasks with quantum classifiers
Quantum adversarial machine learning is an emerging field that studies the
vulnerability of quantum learning systems against adversarial perturbations and
develops possible defense strategies. Quantum universal adversarial
perturbations are small perturbations, which can make different input samples
into adversarial examples that may deceive a given quantum classifier. This is
a field that was rarely looked into but worthwhile investigating because
universal perturbations might simplify malicious attacks to a large extent,
causing unexpected devastation to quantum machine learning models. In this
paper, we take a step forward and explore the quantum universal perturbations
in the context of heterogeneous classification tasks. In particular, we find
that quantum classifiers that achieve almost state-of-the-art accuracy on two
different classification tasks can be both conclusively deceived by one
carefully-crafted universal perturbation. This result is explicitly
demonstrated with well-designed quantum continual learning models with elastic
weight consolidation method to avoid catastrophic forgetting, as well as
real-life heterogeneous datasets from hand-written digits and medical MRI
images. Our results provide a simple and efficient way to generate universal
perturbations on heterogeneous classification tasks and thus would provide
valuable guidance for future quantum learning technologies
Generative Adversarial Perturbations
In this paper, we propose novel generative models for creating adversarial
examples, slightly perturbed images resembling natural images but maliciously
crafted to fool pre-trained models. We present trainable deep neural networks
for transforming images to adversarial perturbations. Our proposed models can
produce image-agnostic and image-dependent perturbations for both targeted and
non-targeted attacks. We also demonstrate that similar architectures can
achieve impressive results in fooling classification and semantic segmentation
models, obviating the need for hand-crafting attack methods for each task.
Using extensive experiments on challenging high-resolution datasets such as
ImageNet and Cityscapes, we show that our perturbations achieve high fooling
rates with small perturbation norms. Moreover, our attacks are considerably
faster than current iterative methods at inference time.Comment: CVPR 2018, camera-ready versio
- …