12 research outputs found
On The Utility of Conditional Generation Based Mutual Information for Characterizing Adversarial Subspaces
Recent studies have found that deep learning systems are vulnerable to
adversarial examples; e.g., visually unrecognizable adversarial images can
easily be crafted to result in misclassification. The robustness of neural
networks has been studied extensively in the context of adversary detection,
which compares a metric that exhibits strong discriminate power between natural
and adversarial examples. In this paper, we propose to characterize the
adversarial subspaces through the lens of mutual information (MI) approximated
by conditional generation methods. We use MI as an information-theoretic metric
to strengthen existing defenses and improve the performance of adversary
detection. Experimental results on MagNet defense demonstrate that our proposed
MI detector can strengthen its robustness against powerful adversarial attacks.Comment: Accepted to IEEE GlobalSIP 201
Attacking the Madry Defense Model with -based Adversarial Examples
The Madry Lab recently hosted a competition designed to test the robustness
of their adversarially trained MNIST model. Attacks were constrained to perturb
each pixel of the input image by a scaled maximal distortion
= 0.3. This discourages the use of attacks which are not optimized
on the distortion metric. Our experimental results demonstrate that
by relaxing the constraint of the competition, the elastic-net
attack to deep neural networks (EAD) can generate transferable adversarial
examples which, despite their high average distortion, have minimal
visual distortion. These results call into question the use of as a
sole measure for visual distortion, and further demonstrate the power of EAD at
generating robust adversarial examples.Comment: Accepted to ICLR 2018 Workshop
Generating Natural Language Adversarial Examples
Deep neural networks (DNNs) are vulnerable to adversarial examples,
perturbations to correctly classified examples which can cause the model to
misclassify. In the image domain, these perturbations are often virtually
indistinguishable to human perception, causing humans and state-of-the-art
models to disagree. However, in the natural language domain, small
perturbations are clearly perceptible, and the replacement of a single word can
drastically alter the semantics of the document. Given these challenges, we use
a black-box population-based optimization algorithm to generate semantically
and syntactically similar adversarial examples that fool well-trained sentiment
analysis and textual entailment models with success rates of 97% and 70%,
respectively. We additionally demonstrate that 92.3% of the successful
sentiment analysis adversarial examples are classified to their original label
by 20 human annotators, and that the examples are perceptibly quite similar.
Finally, we discuss an attempt to use adversarial training as a defense, but
fail to yield improvement, demonstrating the strength and diversity of our
adversarial examples. We hope our findings encourage researchers to pursue
improving the robustness of DNNs in the natural language domain.Comment: Accepted in EMNLP 2018 (Conference on Empirical Methods in Natural
Language Processing
Adversarial Examples as an Input-Fault Tolerance Problem
We analyze the adversarial examples problem in terms of a model's fault
tolerance with respect to its input. Whereas previous work focuses on
arbitrarily strict threat models, i.e., -perturbations, we consider
arbitrary valid inputs and propose an information-based characteristic for
evaluating tolerance to diverse input faults.Comment: NIPS 2018 Workshop on Security and Machine Learning. Source available
at https://github.com/uoguelph-mlrg/nips18-secml-advex-input-faul
CAAD 2018: Generating Transferable Adversarial Examples
Deep neural networks (DNNs) are vulnerable to adversarial examples,
perturbations carefully crafted to fool the targeted DNN, in both the
non-targeted and targeted case. In the non-targeted case, the attacker simply
aims to induce misclassification. In the targeted case, the attacker aims to
induce classification to a specified target class. In addition, it has been
observed that strong adversarial examples can transfer to unknown models,
yielding a serious security concern. The NIPS 2017 competition was organized to
accelerate research in adversarial attacks and defenses, taking place in the
realistic setting where submitted adversarial attacks attempt to transfer to
submitted defenses. The CAAD 2018 competition took place with nearly identical
rules to the NIPS 2017 one. Given the requirement that the NIPS 2017
submissions were to be open-sourced, participants in the CAAD 2018 competition
were able to directly build upon previous solutions, and thus improve the
state-of-the-art in this setting. Our team participated in the CAAD 2018
competition, and won 1st place in both attack subtracks, non-targeted and
targeted adversarial attacks, and 3rd place in defense. We outline our
solutions and development results in this article. We hope our results can
inform researchers in both generating and defending against adversarial
examples.Comment: 1st place attack solutions and 3rd place defense in CAAD 2018
Competitio
Transfer of Adversarial Robustness Between Perturbation Types
We study the transfer of adversarial robustness of deep neural networks
between different perturbation types. While most work on adversarial examples
has focused on and -bounded perturbations, these do not capture
all types of perturbations available to an adversary. The present work
evaluates 32 attacks of 5 different types against models adversarially trained
on a 100-class subset of ImageNet. Our empirical results suggest that
evaluating on a wide range of perturbation sizes is necessary to understand
whether adversarial robustness transfers between perturbation types. We further
demonstrate that robustness against one perturbation type may not always imply
and may sometimes hurt robustness against other perturbation types. In light of
these results, we recommend evaluation of adversarial defenses take place on a
diverse range of perturbation types and sizes.Comment: 11 pages, 6 figure
GenAttack: Practical Black-box Attacks with Gradient-Free Optimization
Deep neural networks are vulnerable to adversarial examples, even in the
black-box setting, where the attacker is restricted solely to query access.
Existing black-box approaches to generating adversarial examples typically
require a significant number of queries, either for training a substitute
network or performing gradient estimation. We introduce GenAttack, a
gradient-free optimization technique that uses genetic algorithms for
synthesizing adversarial examples in the black-box setting. Our experiments on
different datasets (MNIST, CIFAR-10, and ImageNet) show that GenAttack can
successfully generate visually imperceptible adversarial examples against
state-of-the-art image recognition models with orders of magnitude fewer
queries than previous approaches. Against MNIST and CIFAR-10 models, GenAttack
required roughly 2,126 and 2,568 times fewer queries respectively, than ZOO,
the prior state-of-the-art black-box attack. In order to scale up the attack to
large-scale high-dimensional ImageNet models, we perform a series of
optimizations that further improve the query efficiency of our attack leading
to 237 times fewer queries against the Inception-v3 model than ZOO.
Furthermore, we show that GenAttack can successfully attack some
state-of-the-art ImageNet defenses, including ensemble adversarial training and
non-differentiable or randomized input transformations. Our results suggest
that evolutionary algorithms open up a promising area of research into
effective black-box attacks.Comment: Accepted in The Genetic and Evolutionary Computation Conference
(GECCO) 201
Defending Against Multiple and Unforeseen Adversarial Videos
Adversarial robustness of deep neural networks has been actively
investigated. However, most existing defense approaches are limited to a
specific type of adversarial perturbations. Specifically, they often fail to
offer resistance to multiple attack types simultaneously, i.e., they lack
multi-perturbation robustness. Furthermore, compared to image recognition
problems, the adversarial robustness of video recognition models is relatively
unexplored. While several studies have proposed how to generate adversarial
videos, only a handful of approaches about defense strategies have been
published in the literature. In this paper, we propose one of the first defense
strategies against multiple types of adversarial videos for video recognition.
The proposed method, referred to as MultiBN, performs adversarial training on
multiple adversarial video types using multiple independent batch normalization
(BN) layers with a learning-based BN selection module. With a multiple BN
structure, each BN brach is responsible for learning the distribution of a
single perturbation type and thus provides more precise distribution
estimations. This mechanism benefits dealing with multiple perturbation types.
The BN selection module detects the attack type of an input video and sends it
to the corresponding BN branch, making MultiBN fully automatic and allowing
end-to-end training. Compared to present adversarial training approaches, the
proposed MultiBN exhibits stronger multi-perturbation robustness against
different and even unforeseen adversarial video types, ranging from Lp-bounded
attacks and physically realizable attacks. This holds true on different
datasets and target models. Moreover, we conduct an extensive analysis to study
the properties of the multiple BN structure.Comment: Accepted in IEEE Transactions on Image Processing (TIP
Towards Robustness against Unsuspicious Adversarial Examples
Despite the remarkable success of deep neural networks, significant concerns
have emerged about their robustness to adversarial perturbations to inputs.
While most attacks aim to ensure that these are imperceptible, physical
perturbation attacks typically aim for being unsuspicious, even if perceptible.
However, there is no universal notion of what it means for adversarial examples
to be unsuspicious. We propose an approach for modeling suspiciousness by
leveraging cognitive salience. Specifically, we split an image into foreground
(salient region) and background (the rest), and allow significantly larger
adversarial perturbations in the background, while ensuring that cognitive
salience of background remains low. We describe how to compute the resulting
non-salience-preserving dual-perturbation attacks on classifiers. We then
experimentally demonstrate that our attacks indeed do not significantly change
perceptual salience of the background, but are highly effective against
classifiers robust to conventional attacks. Furthermore, we show that
adversarial training with dual-perturbation attacks yields classifiers that are
more robust to these than state-of-the-art robust learning approaches, and
comparable in terms of robustness to conventional attacks.Comment: v2.
On the Limitation of MagNet Defense against -based Adversarial Examples
In recent years, defending adversarial perturbations to natural examples in
order to build robust machine learning models trained by deep neural networks
(DNNs) has become an emerging research field in the conjunction of deep
learning and security. In particular, MagNet consisting of an adversary
detector and a data reformer is by far one of the strongest defenses in the
black-box oblivious attack setting, where the attacker aims to craft
transferable adversarial examples from an undefended DNN model to bypass an
unknown defense module deployed on the same DNN model. Under this setting,
MagNet can successfully defend a variety of attacks in DNNs, including the
high-confidence adversarial examples generated by the Carlini and Wagner's
attack based on the distortion metric. However, in this paper, under the
same attack setting we show that adversarial examples crafted based on the
distortion metric can easily bypass MagNet and mislead the target DNN
image classifiers on MNIST and CIFAR-10. We also provide explanations on why
the considered approach can yield adversarial examples with superior attack
performance and conduct extensive experiments on variants of MagNet to verify
its lack of robustness to distortion based attacks. Notably, our results
substantially weaken the assumption of effective threat models on MagNet that
require knowing the deployed defense technique when attacking DNNs (i.e., the
gray-box attack setting).Comment: Accepted to IEEE/IFIP International Conference on Dependable and
Systems and Networks (DSN) 2018 Workshop on Dependable and Secure Machine
Learnin