45,941 research outputs found
Generating Adversarial Examples with Adversarial Networks
Deep neural networks (DNNs) have been found to be vulnerable to adversarial
examples resulting from adding small-magnitude perturbations to inputs. Such
adversarial examples can mislead DNNs to produce adversary-selected results.
Different attack strategies have been proposed to generate adversarial
examples, but how to produce them with high perceptual quality and more
efficiently requires more research efforts. In this paper, we propose AdvGAN to
generate adversarial examples with generative adversarial networks (GANs),
which can learn and approximate the distribution of original instances. For
AdvGAN, once the generator is trained, it can generate adversarial
perturbations efficiently for any instance, so as to potentially accelerate
adversarial training as defenses. We apply AdvGAN in both semi-whitebox and
black-box attack settings. In semi-whitebox attacks, there is no need to access
the original target model after the generator is trained, in contrast to
traditional white-box attacks. In black-box attacks, we dynamically train a
distilled model for the black-box model and optimize the generator accordingly.
Adversarial examples generated by AdvGAN on different target models have high
attack success rate under state-of-the-art defenses compared to other attacks.
Our attack has placed the first with 92.76% accuracy on a public MNIST
black-box attack challenge.Comment: Accepted to IJCAI201
A Geometry-Inspired Attack for Generating Natural Language Adversarial Examples
Generating adversarial examples for natural language is hard, as natural
language consists of discrete symbols, and examples are often of variable
lengths. In this paper, we propose a geometry-inspired attack for generating
natural language adversarial examples. Our attack generates adversarial
examples by iteratively approximating the decision boundary of Deep Neural
Networks (DNNs). Experiments on two datasets with two different models show
that our attack fools natural language models with high success rates, while
only replacing a few words. Human evaluation shows that adversarial examples
generated by our attack are hard for humans to recognize. Further experiments
show that adversarial training can improve model robustness against our attack.Comment: COLING 2020 Long Pape
Adversarial Diversity and Hard Positive Generation
State-of-the-art deep neural networks suffer from a fundamental problem -
they misclassify adversarial examples formed by applying small perturbations to
inputs. In this paper, we present a new psychometric perceptual adversarial
similarity score (PASS) measure for quantifying adversarial images, introduce
the notion of hard positive generation, and use a diverse set of adversarial
perturbations - not just the closest ones - for data augmentation. We introduce
a novel hot/cold approach for adversarial example generation, which provides
multiple possible adversarial perturbations for every single image. The
perturbations generated by our novel approach often correspond to semantically
meaningful image structures, and allow greater flexibility to scale
perturbation-amplitudes, which yields an increased diversity of adversarial
images. We present adversarial images on several network topologies and
datasets, including LeNet on the MNIST dataset, and GoogLeNet and ResidualNet
on the ImageNet dataset. Finally, we demonstrate on LeNet and GoogLeNet that
fine-tuning with a diverse set of hard positives improves the robustness of
these networks compared to training with prior methods of generating
adversarial images.Comment: Accepted to CVPR 2016 DeepVision Worksho
Boosting the Adversarial Transferability of Surrogate Models with Dark Knowledge
Deep neural networks (DNNs) are vulnerable to adversarial examples. And, the
adversarial examples have transferability, which means that an adversarial
example for a DNN model can fool another model with a non-trivial probability.
This gave birth to the transfer-based attack where the adversarial examples
generated by a surrogate model are used to conduct black-box attacks. There are
some work on generating the adversarial examples from a given surrogate model
with better transferability. However, training a special surrogate model to
generate adversarial examples with better transferability is relatively
under-explored. This paper proposes a method for training a surrogate model
with dark knowledge to boost the transferability of the adversarial examples
generated by the surrogate model. This trained surrogate model is named dark
surrogate model (DSM). The proposed method for training a DSM consists of two
key components: a teacher model extracting dark knowledge, and the mixing
augmentation skill enhancing dark knowledge of training data. We conducted
extensive experiments to show that the proposed method can substantially
improve the adversarial transferability of surrogate models across different
architectures of surrogate models and optimizers for generating adversarial
examples, and it can be applied to other scenarios of transfer-based attack
that contain dark knowledge, like face verification. Our code is publicly
available at \url{https://github.com/ydc123/Dark_Surrogate_Model}.Comment: Accepted at 2023 International Conference on Tools with Artificial
Intelligence (ICTAI
- …