Exploiting epistemic uncertainty of the deep learning models to generate adversarial samples

Abstract

Deep neural network (DNN) architectures are considered to be robust to random perturbations. Nevertheless, it was shown that they could be severely vulnerable to slight but carefully crafted perturbations of the input, termed as adversarial samples. In recent years, numerous studies have been conducted in this new area called ``Adversarial Machine Learning” to devise new adversarial attacks and to defend against these attacks with more robust DNN architectures. However, most of the current research has concentrated on utilising model loss function to craft adversarial examples or to create robust models. This study explores the usage of quantified epistemic uncertainty obtained from Monte-Carlo Dropout Sampling for adversarial attack purposes by which we perturb the input to the shifted-domain regions where the model has not been trained on. We proposed new attack ideas by exploiting the difficulty of the target model to discriminate between samples drawn from original and shifted versions of the training data distribution by utilizing epistemic uncertainty of the model. Our results show that our proposed hybrid attack approach increases the attack success rates from 82.59% to 85.14%, 82.96% to 90.13% and 89.44% to 91.06% on MNIST Digit, MNIST Fashion and CIFAR-10 datasets, respectively.Publisher's VersionWOS:000757777400006PMID: 3522177

    Similar works