37 research outputs found
Going Far Boosts Attack Transferability, but Do Not Do It
Deep Neural Networks (DNNs) could be easily fooled by Adversarial Examples
(AEs) with an imperceptible difference to original ones in human eyes. Also,
the AEs from attacking one surrogate DNN tend to cheat other black-box DNNs as
well, i.e., the attack transferability. Existing works reveal that adopting
certain optimization algorithms in attack improves transferability, but the
underlying reasons have not been thoroughly studied. In this paper, we
investigate the impacts of optimization on attack transferability by
comprehensive experiments concerning 7 optimization algorithms, 4 surrogates,
and 9 black-box models. Through the thorough empirical analysis from three
perspectives, we surprisingly find that the varied transferability of AEs from
optimization algorithms is strongly related to the corresponding Root Mean
Square Error (RMSE) from their original samples. On such a basis, one could
simply approach high transferability by attacking until RMSE decreases, which
motives us to propose a LArge RMSE Attack (LARA). Although LARA significantly
improves transferability by 20%, it is insufficient to exploit the
vulnerability of DNNs, leading to a natural urge that the strength of all
attacks should be measured by both the widely used bound and the
RMSE addressed in this paper, so that tricky enhancement of transferability
would be avoided
DifAttack: Query-Efficient Black-Box Attack via Disentangled Feature Space
This work investigates efficient score-based black-box adversarial attacks
with a high Attack Success Rate (ASR) and good generalizability. We design a
novel attack method based on a Disentangled Feature space, called DifAttack,
which differs significantly from the existing ones operating over the entire
feature space. Specifically, DifAttack firstly disentangles an image's latent
feature into an adversarial feature and a visual feature, where the former
dominates the adversarial capability of an image, while the latter largely
determines its visual appearance. We train an autoencoder for the
disentanglement by using pairs of clean images and their Adversarial Examples
(AEs) generated from available surrogate models via white-box attack methods.
Eventually, DifAttack iteratively optimizes the adversarial feature according
to the query feedback from the victim model until a successful AE is generated,
while keeping the visual feature unaltered. In addition, due to the avoidance
of using surrogate models' gradient information when optimizing AEs for
black-box models, our proposed DifAttack inherently possesses better attack
capability in the open-set scenario, where the training dataset of the victim
model is unknown. Extensive experimental results demonstrate that our method
achieves significant improvements in ASR and query efficiency simultaneously,
especially in the targeted attack and open-set scenarios. The code will be
available at https://github.com/csjunjun/DifAttack.git soon