3,059 research outputs found
Defense against Universal Adversarial Perturbations
Recent advances in Deep Learning show the existence of image-agnostic
quasi-imperceptible perturbations that when applied to `any' image can fool a
state-of-the-art network classifier to change its prediction about the image
label. These `Universal Adversarial Perturbations' pose a serious threat to the
success of Deep Learning in practice. We present the first dedicated framework
to effectively defend the networks against such perturbations. Our approach
learns a Perturbation Rectifying Network (PRN) as `pre-input' layers to a
targeted model, such that the targeted model needs no modification. The PRN is
learned from real and synthetic image-agnostic perturbations, where an
efficient method to compute the latter is also proposed. A perturbation
detector is separately trained on the Discrete Cosine Transform of the
input-output difference of the PRN. A query image is first passed through the
PRN and verified by the detector. If a perturbation is detected, the output of
the PRN is used for label prediction instead of the actual image. A rigorous
evaluation shows that our framework can defend the network classifiers against
unseen adversarial perturbations in the real-world scenarios with up to 97.5%
success rate. The PRN also generalizes well in the sense that training for one
targeted network defends another network with a comparable success rate.Comment: Accepted in IEEE CVPR 201
Disentangling Adversarial Robustness and Generalization
Obtaining deep networks that are robust against adversarial examples and
generalize well is an open problem. A recent hypothesis even states that both
robust and accurate models are impossible, i.e., adversarial robustness and
generalization are conflicting goals. In an effort to clarify the relationship
between robustness and generalization, we assume an underlying, low-dimensional
data manifold and show that: 1. regular adversarial examples leave the
manifold; 2. adversarial examples constrained to the manifold, i.e.,
on-manifold adversarial examples, exist; 3. on-manifold adversarial examples
are generalization errors, and on-manifold adversarial training boosts
generalization; 4. regular robustness and generalization are not necessarily
contradicting goals. These assumptions imply that both robust and accurate
models are possible. However, different models (architectures, training
strategies etc.) can exhibit different robustness and generalization
characteristics. To confirm our claims, we present extensive experiments on
synthetic data (with known manifold) as well as on EMNIST, Fashion-MNIST and
CelebA.Comment: Conference on Computer Vision and Pattern Recognition 201
Procedural Noise Adversarial Examples for Black-Box Attacks on Deep Convolutional Networks
Deep Convolutional Networks (DCNs) have been shown to be vulnerable to
adversarial examples---perturbed inputs specifically designed to produce
intentional errors in the learning algorithms at test time. Existing
input-agnostic adversarial perturbations exhibit interesting visual patterns
that are currently unexplained. In this paper, we introduce a structured
approach for generating Universal Adversarial Perturbations (UAPs) with
procedural noise functions. Our approach unveils the systemic vulnerability of
popular DCN models like Inception v3 and YOLO v3, with single noise patterns
able to fool a model on up to 90% of the dataset. Procedural noise allows us to
generate a distribution of UAPs with high universal evasion rates using only a
few parameters. Additionally, we propose Bayesian optimization to efficiently
learn procedural noise parameters to construct inexpensive untargeted black-box
attacks. We demonstrate that it can achieve an average of less than 10 queries
per successful attack, a 100-fold improvement on existing methods. We further
motivate the use of input-agnostic defences to increase the stability of models
to adversarial perturbations. The universality of our attacks suggests that DCN
models may be sensitive to aggregations of low-level class-agnostic features.
These findings give insight on the nature of some universal adversarial
perturbations and how they could be generated in other applications.Comment: 16 pages, 10 figures. In Proceedings of the 2019 ACM SIGSAC
Conference on Computer and Communications Security (CCS '19
- …