20 research outputs found
A Frank-Wolfe Framework for Efficient and Effective Adversarial Attacks
Depending on how much information an adversary can access to, adversarial
attacks can be classified as white-box attack and black-box attack. For
white-box attack, optimization-based attack algorithms such as projected
gradient descent (PGD) can achieve relatively high attack success rates within
moderate iterates. However, they tend to generate adversarial examples near or
upon the boundary of the perturbation set, resulting in large distortion.
Furthermore, their corresponding black-box attack algorithms also suffer from
high query complexities, thereby limiting their practical usefulness. In this
paper, we focus on the problem of developing efficient and effective
optimization-based adversarial attack algorithms. In particular, we propose a
novel adversarial attack framework for both white-box and black-box settings
based on a variant of Frank-Wolfe algorithm. We show in theory that the
proposed attack algorithms are efficient with an convergence
rate. The empirical results of attacking the ImageNet and MNIST datasets also
verify the efficiency and effectiveness of the proposed algorithms. More
specifically, our proposed algorithms attain the best attack performances in
both white-box and black-box attacks among all baselines, and are more time and
query efficient than the state-of-the-art.Comment: 25 pages, 1 figure, 7 table
An Empirical Study of Derivative-Free-Optimization Algorithms for Targeted Black-Box Attacks in Deep Neural Networks
We perform a comprehensive study on the performance of derivative free
optimization (DFO) algorithms for the generation of targeted black-box
adversarial attacks on Deep Neural Network (DNN) classifiers assuming the
perturbation energy is bounded by an constraint and the number of
queries to the network is limited. This paper considers four pre-existing
state-of-the-art DFO-based algorithms along with the introduction of a new
algorithm built on BOBYQA, a model-based DFO method. We compare these
algorithms in a variety of settings according to the fraction of images that
they successfully misclassify given a maximum number of queries to the DNN.
The experiments disclose how the likelihood of finding an adversarial example
depends on both the algorithm used and the setting of the attack; algorithms
limiting the search of adversarial example to the vertices of the
constraint work particularly well without structural defenses, while the
presented BOBYQA based algorithm works better for especially small perturbation
energies. This variance in performance highlights the importance of new
algorithms being compared to the state-of-the-art in a variety of settings, and
the effectiveness of adversarial defenses being tested using as wide a range of
algorithms as possible.Comment: arXiv admin note: text overlap with arXiv:2002.1034