Search CORE

20 research outputs found

A Frank-Wolfe Framework for Efficient and Effective Adversarial Attacks

Author: Chen Jinghui
Gu Quanquan
Yi Jinfeng
Zhou Dongruo
Publication venue
Publication date: 15/09/2019
Field of study

Depending on how much information an adversary can access to, adversarial attacks can be classified as white-box attack and black-box attack. For white-box attack, optimization-based attack algorithms such as projected gradient descent (PGD) can achieve relatively high attack success rates within moderate iterates. However, they tend to generate adversarial examples near or upon the boundary of the perturbation set, resulting in large distortion. Furthermore, their corresponding black-box attack algorithms also suffer from high query complexities, thereby limiting their practical usefulness. In this paper, we focus on the problem of developing efficient and effective optimization-based adversarial attack algorithms. In particular, we propose a novel adversarial attack framework for both white-box and black-box settings based on a variant of Frank-Wolfe algorithm. We show in theory that the proposed attack algorithms are efficient with an

O(1/\sqrt{T})

convergence rate. The empirical results of attacking the ImageNet and MNIST datasets also verify the efficiency and effectiveness of the proposed algorithms. More specifically, our proposed algorithms attain the best attack performances in both white-box and black-box attacks among all baselines, and are more time and query efficient than the state-of-the-art.Comment: 25 pages, 1 figure, 7 table

arXiv.org e-Print Archive

Association for the Advancement of Artificial Intelligence: AAAI Publications

An Empirical Study of Derivative-Free-Optimization Algorithms for Targeted Black-Box Attacks in Deep Neural Networks

Author: Abrol Vinayak
Tanner Jared
Ughi Giuseppe
Publication venue
Publication date: 03/12/2020
Field of study

We perform a comprehensive study on the performance of derivative free optimization (DFO) algorithms for the generation of targeted black-box adversarial attacks on Deep Neural Network (DNN) classifiers assuming the perturbation energy is bounded by an

\ell_\infty

constraint and the number of queries to the network is limited. This paper considers four pre-existing state-of-the-art DFO-based algorithms along with the introduction of a new algorithm built on BOBYQA, a model-based DFO method. We compare these algorithms in a variety of settings according to the fraction of images that they successfully misclassify given a maximum number of queries to the DNN. The experiments disclose how the likelihood of finding an adversarial example depends on both the algorithm used and the setting of the attack; algorithms limiting the search of adversarial example to the vertices of the

\ell^\infty

constraint work particularly well without structural defenses, while the presented BOBYQA based algorithm works better for especially small perturbation energies. This variance in performance highlights the importance of new algorithms being compared to the state-of-the-art in a variety of settings, and the effectiveness of adversarial defenses being tested using as wide a range of algorithms as possible.Comment: arXiv admin note: text overlap with arXiv:2002.1034

arXiv.org e-Print Archive

Oxford University Research Archive