State-of-the-art generative model-based attacks against image classifiers
overwhelmingly focus on single-object (i.e., single dominant object) images.
Different from such settings, we tackle a more practical problem of generating
adversarial perturbations using multi-object (i.e., multiple dominant objects)
images as they are representative of most real-world scenes. Our goal is to
design an attack strategy that can learn from such natural scenes by leveraging
the local patch differences that occur inherently in such images (e.g.
difference between the local patch on the object `person' and the object `bike'
in a traffic scene). Our key idea is to misclassify an adversarial multi-object
image by confusing the victim classifier for each local patch in the image.
Based on this, we propose a novel generative attack (called Local Patch
Difference or LPD-Attack) where a novel contrastive loss function uses the
aforesaid local differences in feature space of multi-object scenes to optimize
the perturbation generator. Through various experiments across diverse victim
convolutional neural networks, we show that our approach outperforms baseline
generative attacks with highly transferable perturbations when evaluated under
different white-box and black-box settings.Comment: Accepted at WACV 2023 (Round 1), camera-ready versio