10,201 research outputs found
Defense against Adversarial Attacks Using High-Level Representation Guided Denoiser
Neural networks are vulnerable to adversarial examples, which poses a threat
to their application in security sensitive systems. We propose high-level
representation guided denoiser (HGD) as a defense for image classification.
Standard denoiser suffers from the error amplification effect, in which small
residual adversarial noise is progressively amplified and leads to wrong
classifications. HGD overcomes this problem by using a loss function defined as
the difference between the target model's outputs activated by the clean image
and denoised image. Compared with ensemble adversarial training which is the
state-of-the-art defending method on large images, HGD has three advantages.
First, with HGD as a defense, the target model is more robust to either
white-box or black-box adversarial attacks. Second, HGD can be trained on a
small subset of the images and generalizes well to other images and unseen
classes. Third, HGD can be transferred to defend models other than the one
guiding it. In NIPS competition on defense against adversarial attacks, our HGD
solution won the first place and outperformed other models by a large margin
Procedural Noise Adversarial Examples for Black-Box Attacks on Deep Convolutional Networks
Deep Convolutional Networks (DCNs) have been shown to be vulnerable to
adversarial examples---perturbed inputs specifically designed to produce
intentional errors in the learning algorithms at test time. Existing
input-agnostic adversarial perturbations exhibit interesting visual patterns
that are currently unexplained. In this paper, we introduce a structured
approach for generating Universal Adversarial Perturbations (UAPs) with
procedural noise functions. Our approach unveils the systemic vulnerability of
popular DCN models like Inception v3 and YOLO v3, with single noise patterns
able to fool a model on up to 90% of the dataset. Procedural noise allows us to
generate a distribution of UAPs with high universal evasion rates using only a
few parameters. Additionally, we propose Bayesian optimization to efficiently
learn procedural noise parameters to construct inexpensive untargeted black-box
attacks. We demonstrate that it can achieve an average of less than 10 queries
per successful attack, a 100-fold improvement on existing methods. We further
motivate the use of input-agnostic defences to increase the stability of models
to adversarial perturbations. The universality of our attacks suggests that DCN
models may be sensitive to aggregations of low-level class-agnostic features.
These findings give insight on the nature of some universal adversarial
perturbations and how they could be generated in other applications.Comment: 16 pages, 10 figures. In Proceedings of the 2019 ACM SIGSAC
Conference on Computer and Communications Security (CCS '19
The Odds are Odd: A Statistical Test for Detecting Adversarial Examples
We investigate conditions under which test statistics exist that can reliably
detect examples, which have been adversarially manipulated in a white-box
attack. These statistics can be easily computed and calibrated by randomly
corrupting inputs. They exploit certain anomalies that adversarial attacks
introduce, in particular if they follow the paradigm of choosing perturbations
optimally under p-norm constraints. Access to the log-odds is the only
requirement to defend models. We justify our approach empirically, but also
provide conditions under which detectability via the suggested test statistics
is guaranteed to be effective. In our experiments, we show that it is even
possible to correct test time predictions for adversarial attacks with high
accuracy
- …