Regional Image Perturbation Reduces $L_p$ Norms of Adversarial Examples
  While Maintaining Model-to-model Transferability

De Neve, Wesley; Goossens, Bart; Ozbulak, Utku; Peck, Jonathan; Saeys, Yvan; Van Messem, Arnout

Regional Image Perturbation Reduces $L_p$ Norms of Adversarial Examples While Maintaining Model-to-model Transferability

Authors: Wesley De Neve
Bart Goossens
Utku Ozbulak
Jonathan Peck
Yvan Saeys
Arnout Van Messem
Publication date: 1 January 2020
Publisher

Abstract

Regional adversarial attacks often rely on complicated methods for generating adversarial perturbations, making it hard to compare their efficacy against well-known attacks. In this study, we show that effective regional perturbations can be generated without resorting to complex methods. We develop a very simple regional adversarial perturbation attack method using cross-entropy sign, one of the most commonly used losses in adversarial machine learning. Our experiments on ImageNet with multiple models reveal that, on average,

76\%

of the generated adversarial examples maintain model-to-model transferability when the perturbation is applied to local image regions. Depending on the selected region, these localized adversarial examples require significantly less

L_p

norm distortion (for

p \in \{0, 2, \infty\}

) compared to their non-local counterparts. These localized attacks therefore have the potential to undermine defenses that claim robustness under the aforementioned norms.Comment: Accepted for the ICML 2020, Workshop on Uncertainty and Robustness in Deep Learning (UDL