12,081 research outputs found
Adversarial Sample Detection for Deep Neural Network through Model Mutation Testing
Deep neural networks (DNN) have been shown to be useful in a wide range of
applications. However, they are also known to be vulnerable to adversarial
samples. By transforming a normal sample with some carefully crafted human
imperceptible perturbations, even highly accurate DNN make wrong decisions.
Multiple defense mechanisms have been proposed which aim to hinder the
generation of such adversarial samples. However, a recent work show that most
of them are ineffective. In this work, we propose an alternative approach to
detect adversarial samples at runtime. Our main observation is that adversarial
samples are much more sensitive than normal samples if we impose random
mutations on the DNN. We thus first propose a measure of `sensitivity' and show
empirically that normal samples and adversarial samples have distinguishable
sensitivity. We then integrate statistical hypothesis testing and model
mutation testing to check whether an input sample is likely to be normal or
adversarial at runtime by measuring its sensitivity. We evaluated our approach
on the MNIST and CIFAR10 datasets. The results show that our approach detects
adversarial samples generated by state-of-the-art attacking methods efficiently
and accurately.Comment: Accepted by ICSE 201
Efficient Defenses Against Adversarial Attacks
Following the recent adoption of deep neural networks (DNN) accross a wide
range of applications, adversarial attacks against these models have proven to
be an indisputable threat. Adversarial samples are crafted with a deliberate
intention of undermining a system. In the case of DNNs, the lack of better
understanding of their working has prevented the development of efficient
defenses. In this paper, we propose a new defense method based on practical
observations which is easy to integrate into models and performs better than
state-of-the-art defenses. Our proposed solution is meant to reinforce the
structure of a DNN, making its prediction more stable and less likely to be
fooled by adversarial samples. We conduct an extensive experimental study
proving the efficiency of our method against multiple attacks, comparing it to
numerous defenses, both in white-box and black-box setups. Additionally, the
implementation of our method brings almost no overhead to the training
procedure, while maintaining the prediction performance of the original model
on clean samples.Comment: 16 page
- …