1,049 research outputs found
Constructing Unrestricted Adversarial Examples with Generative Models
Adversarial examples are typically constructed by perturbing an existing data
point within a small matrix norm, and current defense methods are focused on
guarding against this type of attack. In this paper, we propose unrestricted
adversarial examples, a new threat model where the attackers are not restricted
to small norm-bounded perturbations. Different from perturbation-based attacks,
we propose to synthesize unrestricted adversarial examples entirely from
scratch using conditional generative models. Specifically, we first train an
Auxiliary Classifier Generative Adversarial Network (AC-GAN) to model the
class-conditional distribution over data samples. Then, conditioned on a
desired class, we search over the AC-GAN latent space to find images that are
likely under the generative model and are misclassified by a target classifier.
We demonstrate through human evaluation that unrestricted adversarial examples
generated this way are legitimate and belong to the desired class. Our
empirical results on the MNIST, SVHN, and CelebA datasets show that
unrestricted adversarial examples can bypass strong adversarial training and
certified defense methods designed for traditional adversarial attacks.Comment: Neural Information Processing Systems (NeurIPS 2018
Adversarial Image Translation: Unrestricted Adversarial Examples in Face Recognition Systems
Thanks to recent advances in deep neural networks (DNNs), face recognition
systems have become highly accurate in classifying a large number of face
images. However, recent studies have found that DNNs could be vulnerable to
adversarial examples, raising concerns about the robustness of such systems.
Adversarial examples that are not restricted to small perturbations could be
more serious since conventional certified defenses might be ineffective against
them. To shed light on the vulnerability to such adversarial examples, we
propose a flexible and efficient method for generating unrestricted adversarial
examples using image translation techniques. Our method enables us to translate
a source image into any desired facial appearance with large perturbations to
deceive target face recognition systems. Our experimental results indicate that
our method achieved about and attack success rates under white- and
black-box settings, respectively, and that the translated images are
perceptually realistic and maintain the identifiability of the individual while
the perturbations are large enough to bypass certified defenses.Comment: Kazuya Kakizaki and Kosuke Yoshida share equal contributions.
Accepted at AAAI Workshop on Artificial Intelligence Safety (2020
AT-GAN: An Adversarial Generator Model for Non-constrained Adversarial Examples
Despite the rapid development of adversarial machine learning, most
adversarial attack and defense researches mainly focus on the
perturbation-based adversarial examples, which is constrained by the input
images. In comparison with existing works, we propose non-constrained
adversarial examples, which are generated entirely from scratch without any
constraint on the input. Unlike perturbation-based attacks, or the so-called
unrestricted adversarial attack which is still constrained by the input noise,
we aim to learn the distribution of adversarial examples to generate
non-constrained but semantically meaningful adversarial examples. Following
this spirit, we propose a novel attack framework called AT-GAN (Adversarial
Transfer on Generative Adversarial Net). Specifically, we first develop a
normal GAN model to learn the distribution of benign data, and then transfer
the pre-trained GAN model to estimate the distribution of adversarial examples
for the target model. In this way, AT-GAN can learn the distribution of
adversarial examples that is very close to the distribution of real data. To
our knowledge, this is the first work of building an adversarial generator
model that could produce adversarial examples directly from any input noise.
Extensive experiments and visualizations show that the proposed AT-GAN can very
efficiently generate diverse adversarial examples that are more realistic to
human perception. In addition, AT-GAN yields higher attack success rates
against adversarially trained models under white-box attack setting and
exhibits moderate transferability against black-box models.Comment: 15 pages, 6 figure
Semantic Adversarial Attacks: Parametric Transformations That Fool Deep Classifiers
Deep neural networks have been shown to exhibit an intriguing vulnerability
to adversarial input images corrupted with imperceptible perturbations.
However, the majority of adversarial attacks assume global, fine-grained
control over the image pixel space. In this paper, we consider a different
setting: what happens if the adversary could only alter specific attributes of
the input image? These would generate inputs that might be perceptibly
different, but still natural-looking and enough to fool a classifier. We
propose a novel approach to generate such `semantic' adversarial examples by
optimizing a particular adversarial loss over the range-space of a parametric
conditional generative model. We demonstrate implementations of our attacks on
binary classifiers trained on face images, and show that such natural-looking
semantic adversarial examples exist. We evaluate the effectiveness of our
attack on synthetic and real data, and present detailed comparisons with
existing attack methods. We supplement our empirical results with theoretical
bounds that demonstrate the existence of such parametric adversarial examples.Comment: Accepted to International Conference on Computer Vision, (ICCV) 201
Excessive Invariance Causes Adversarial Vulnerability
Despite their impressive performance, deep neural networks exhibit striking
failures on out-of-distribution inputs. One core idea of adversarial example
research is to reveal neural network errors under such distribution shifts. We
decompose these errors into two complementary sources: sensitivity and
invariance. We show deep networks are not only too sensitive to task-irrelevant
changes of their input, as is well-known from epsilon-adversarial examples, but
are also too invariant to a wide range of task-relevant changes, thus making
vast regions in input space vulnerable to adversarial attacks. We show such
excessive invariance occurs across various tasks and architecture types. On
MNIST and ImageNet one can manipulate the class-specific content of almost any
image without changing the hidden activations. We identify an insufficiency of
the standard cross-entropy loss as a reason for these failures. Further, we
extend this objective based on an information-theoretic analysis so it
encourages the model to consider all task-dependent features in its decision.
This provides the first approach tailored explicitly to overcome excessive
invariance and resulting vulnerabilities
Unrestricted Adversarial Examples via Semantic Manipulation
Machine learning models, especially deep neural networks (DNNs), have been
shown to be vulnerable against adversarial examples which are carefully crafted
samples with a small magnitude of the perturbation. Such adversarial
perturbations are usually restricted by bounding their norm
such that they are imperceptible, and thus many current defenses can exploit
this property to reduce their adversarial impact. In this paper, we instead
introduce "unrestricted" perturbations that manipulate semantically meaningful
image-based visual descriptors - color and texture - in order to generate
effective and photorealistic adversarial examples. We show that these
semantically aware perturbations are effective against JPEG compression,
feature squeezing and adversarially trained model. We also show that the
proposed methods can effectively be applied to both image classification and
image captioning tasks on complex datasets such as ImageNet and MSCOCO. In
addition, we conduct comprehensive user studies to show that our generated
semantic adversarial examples are photorealistic to humans despite large
magnitude perturbations when compared to other attacks.Comment: Accepted to ICLR 2020. First two authors contributed equally. Code:
https://github.com/aisecure/Big-but-Invisible-Adversarial-Attack and
Openreview: https://openreview.net/forum?id=Sye_OgHFw
ShapeAdv: Generating Shape-Aware Adversarial 3D Point Clouds
We introduce ShapeAdv, a novel framework to study shape-aware adversarial
perturbations that reflect the underlying shape variations (e.g., geometric
deformations and structural differences) in the 3D point cloud space. We
develop shape-aware adversarial 3D point cloud attacks by leveraging the
learned latent space of a point cloud auto-encoder where the adversarial noise
is applied in the latent space. Specifically, we propose three different
variants including an exemplar-based one by guiding the shape deformation with
auxiliary data, such that the generated point cloud resembles the shape
morphing between objects in the same category. Different from prior works, the
resulting adversarial 3D point clouds reflect the shape variations in the 3D
point cloud space while still being close to the original one. In addition,
experimental evaluations on the ModelNet40 benchmark demonstrate that our
adversaries are more difficult to defend with existing point cloud defense
methods and exhibit a higher attack transferability across classifiers. Our
shape-aware adversarial attacks are orthogonal to existing point cloud based
attacks and shed light on the vulnerability of 3D deep neural networks.Comment: 3D Point Clouds, Adversarial Learnin
Security Matters: A Survey on Adversarial Machine Learning
Adversarial machine learning is a fast growing research area, which considers
the scenarios when machine learning systems may face potential adversarial
attackers, who intentionally synthesize input data to make a well-trained model
to make mistake. It always involves a defending side, usually a classifier, and
an attacking side that aims to cause incorrect output. The earliest studies on
the adversarial examples for machine learning algorithms start from the
information security area, which considers a much wider varieties of attacking
methods. But recent research focus that popularized by the deep learning
community places strong emphasis on how the "imperceivable" perturbations on
the normal inputs may cause dramatic mistakes by the deep learning with
supposed super-human accuracy. This paper serves to give a comprehensive
introduction to a range of aspects of the adversarial deep learning topic,
including its foundations, typical attacking and defending strategies, and some
extended studies
Structure-Preserving Transformation: Generating Diverse and Transferable Adversarial Examples
Adversarial examples are perturbed inputs designed to fool machine learning
models. Most recent works on adversarial examples for image classification
focus on directly modifying pixels with minor perturbations. A common
requirement in all these works is that the malicious perturbations should be
small enough (measured by an L_p norm for some p) so that they are
imperceptible to humans. However, small perturbations can be unnecessarily
restrictive and limit the diversity of adversarial examples generated. Further,
an L_p norm based distance metric ignores important structure patterns hidden
in images that are important to human perception. Consequently, even the minor
perturbation introduced in recent works often makes the adversarial examples
less natural to humans. More importantly, they often do not transfer well and
are therefore less effective when attacking black-box models especially for
those protected by a defense mechanism. In this paper, we propose a
structure-preserving transformation (SPT) for generating natural and diverse
adversarial examples with extremely high transferability. The key idea of our
approach is to allow perceptible deviation in adversarial examples while
keeping structure patterns that are central to a human classifier. Empirical
results on the MNIST and the fashion-MNIST datasets show that adversarial
examples generated by our approach can easily bypass strong adversarial
training. Further, they transfer well to other target models with no loss or
little loss of successful attack rate.Comment: The AAAI-2019 Workshop on Artificial Intelligence for Cyber Security
(AICS
Adversarial Attack Type I: Cheat Classifiers by Significant Changes
Despite the great success of deep neural networks, the adversarial attack can
cheat some well-trained classifiers by small permutations. In this paper, we
propose another type of adversarial attack that can cheat classifiers by
significant changes. For example, we can significantly change a face but
well-trained neural networks still recognize the adversarial and the original
example as the same person. Statistically, the existing adversarial attack
increases Type II error and the proposed one aims at Type I error, which are
hence named as Type II and Type I adversarial attack, respectively. The two
types of attack are equally important but are essentially different, which are
intuitively explained and numerically evaluated. To implement the proposed
attack, a supervised variation autoencoder is designed and then the classifier
is attacked by updating the latent variables using gradient information.
{Besides, with pre-trained generative models, Type I attack on latent spaces is
investigated as well.} Experimental results show that our method is practical
and effective to generate Type I adversarial examples on large-scale image
datasets. Most of these generated examples can pass detectors designed for
defending Type II attack and the strengthening strategy is only efficient with
a specific type attack, both implying that the underlying reasons for Type I
and Type II attack are different
- …