Robustness against adversarial attacks on deep neural networks

Abstract

While deep neural networks have been successfully applied in several different domains, they exhibit vulnerabilities to artificially-crafted perturbations in data. Moreover, these perturbations have been shown to be transferable across different networks where the same perturbations can be transferred between different models. In response to this problem, many robust learning approaches have emerged. Adversarial training is regarded as a mainstream approach to enhance the robustness of deep neural networks with respect to norm-constrained perturbations. However, adversarial training requires a large number of perturbed examples (e.g., over 100,000 examples are required for MNIST dataset) trained on the deep neural networks before robustness can be considerably enhanced. This is problematic due to the large computational cost of obtaining attacks. Developing computationally effective approaches while retaining robustness against norm-constrained perturbations remains a challenge in the literature. In this research we present two novel robust training algorithms based on Monte-Carlo Tree Search (MCTS) [1] to enhance robustness under norm-constrained perturbations [2, 3]. The first algorithm searches potential candidates with Scale Invariant Feature Transform method and makes decisions with Monte-Carlo Tree Search method [2]. The second algorithm adopts Decision Tree Search method (DTS) to accelerate the search process while maintaining efficiency [3]. Our overarching objective is to provide computationally effective approaches that can be deployed to train deep neural networks robust against perturbations in data. We illustrate the robustness with these algorithms by studying the resistances to adversarial examples obtained in the context of the MNIST and CIFAR10 datasets. For MNIST, the results showed an average training efforts saving of 21.1\% when compared to Projected Gradient Descent (PGD) and 28.3\% when compared to Fast Gradient Sign Methods (FGSM). For CIFAR10, we obtained an average improvement of efficiency of 9.8\% compared to PGD and 13.8\% compared to FGSM. The results suggest that these two methods here introduced are not only robust to norm-constrained perturbations but also efficient during training. In regards to transferability of defences, our experiments [4] reveal that across different network architectures, across a variety of attack methods from white-box to black-box and across various datasets including MNIST and CIFAR10, our algorithms outperform other state-of-the-art methods, e.g., PGD and FGSM. Furthermore, the derived attacks and robust models obtained on our framework are reusable in the sense that the same norm-constrained perturbations can facilitate robust training across different networks. Lastly, we investigate the robustness of intra-technique and cross-technique transferability and the relations with different impact factors from adversarial strength to network capacity. The results suggest that known attacks on the resulting models are less transferable than those models trained by other state-of-the-art attack algorithms. Our results suggest that exploiting these tree search frameworks can result in significant improvements in the robustness of deep neural networks while saving computational cost on robust training. This paves the way for several future directions, both algorithmic and theoretical, as well as numerous applications to establish the robustness of deep neural networks with increasing trust and safety.Open Acces

    Similar works