Deep neural networks (DNN) have been shown to be useful in a wide range of
applications. However, they are also known to be vulnerable to adversarial
samples. By transforming a normal sample with some carefully crafted human
imperceptible perturbations, even highly accurate DNN make wrong decisions.
Multiple defense mechanisms have been proposed which aim to hinder the
generation of such adversarial samples. However, a recent work show that most
of them are ineffective. In this work, we propose an alternative approach to
detect adversarial samples at runtime. Our main observation is that adversarial
samples are much more sensitive than normal samples if we impose random
mutations on the DNN. We thus first propose a measure of `sensitivity' and show
empirically that normal samples and adversarial samples have distinguishable
sensitivity. We then integrate statistical hypothesis testing and model
mutation testing to check whether an input sample is likely to be normal or
adversarial at runtime by measuring its sensitivity. We evaluated our approach
on the MNIST and CIFAR10 datasets. The results show that our approach detects
adversarial samples generated by state-of-the-art attacking methods efficiently
and accurately.Comment: Accepted by ICSE 201