33,424 research outputs found
Adversarial Out-domain Examples for Generative Models
Deep generative models are rapidly becoming a common tool for researchers and
developers. However, as exhaustively shown for the family of discriminative
models, the test-time inference of deep neural networks cannot be fully
controlled and erroneous behaviors can be induced by an attacker. In the
present work, we show how a malicious user can force a pre-trained generator to
reproduce arbitrary data instances by feeding it suitable adversarial inputs.
Moreover, we show that these adversarial latent vectors can be shaped so as to
be statistically indistinguishable from the set of genuine inputs. The proposed
attack technique is evaluated with respect to various GAN images generators
using different architectures, training processes and for both conditional and
not-conditional setups.Comment: accepted in proceedings of the Workshop on Machine Learning for
Cyber-Crime Investigation and Cybersecurit
Adversarial Example Detection and Classification With Asymmetrical Adversarial Training
The vulnerabilities of deep neural networks against adversarial examples have
become a significant concern for deploying these models in sensitive domains.
Devising a definitive defense against such attacks is proven to be challenging,
and the methods relying on detecting adversarial samples are only valid when
the attacker is oblivious to the detection mechanism. In this paper we first
present an adversarial example detection method that provides performance
guarantee to norm constrained adversaries. The method is based on the idea of
training adversarial robust subspace detectors using asymmetrical adversarial
training (AAT). The novel AAT objective presents a minimax problem similar to
that of GANs; it has the same convergence property, and consequently supports
the learning of class conditional distributions. We first demonstrate that the
minimax problem could be reasonably solved by PGD attack, and then use the
learned class conditional generative models to define generative
detection/classification models that are both robust and more interpretable. We
provide comprehensive evaluations of the above methods, and demonstrate their
competitive performances and compelling properties on adversarial detection and
robust classification problems.Comment: ICLR 202
Generating Adversarial Examples with Adversarial Networks
Deep neural networks (DNNs) have been found to be vulnerable to adversarial
examples resulting from adding small-magnitude perturbations to inputs. Such
adversarial examples can mislead DNNs to produce adversary-selected results.
Different attack strategies have been proposed to generate adversarial
examples, but how to produce them with high perceptual quality and more
efficiently requires more research efforts. In this paper, we propose AdvGAN to
generate adversarial examples with generative adversarial networks (GANs),
which can learn and approximate the distribution of original instances. For
AdvGAN, once the generator is trained, it can generate adversarial
perturbations efficiently for any instance, so as to potentially accelerate
adversarial training as defenses. We apply AdvGAN in both semi-whitebox and
black-box attack settings. In semi-whitebox attacks, there is no need to access
the original target model after the generator is trained, in contrast to
traditional white-box attacks. In black-box attacks, we dynamically train a
distilled model for the black-box model and optimize the generator accordingly.
Adversarial examples generated by AdvGAN on different target models have high
attack success rate under state-of-the-art defenses compared to other attacks.
Our attack has placed the first with 92.76% accuracy on a public MNIST
black-box attack challenge.Comment: Accepted to IJCAI201
- …