Unrestricted adversarial attacks present a serious threat to deep learning
models and adversarial defense techniques. They pose severe security problems
for deep learning applications because they can effectively bypass defense
mechanisms. However, previous attack methods often utilize Generative
Adversarial Networks (GANs), which are not theoretically provable and thus
generate unrealistic examples by incorporating adversarial objectives,
especially for large-scale datasets like ImageNet. In this paper, we propose a
new method, called AdvDiff, to generate unrestricted adversarial examples with
diffusion models. We design two novel adversarial guidance techniques to
conduct adversarial sampling in the reverse generation process of diffusion
models. These two techniques are effective and stable to generate high-quality,
realistic adversarial examples by integrating gradients of the target
classifier interpretably. Experimental results on MNIST and ImageNet datasets
demonstrate that AdvDiff is effective to generate unrestricted adversarial
examples, which outperforms GAN-based methods in terms of attack performance
and generation quality