1 research outputs found
Explaining Classifiers using Adversarial Perturbations on the Perceptual Ball
We present a simple regularization of adversarial perturbations based upon
the perceptual loss. While the resulting perturbations remain imperceptible to
the human eye, they differ from existing adversarial perturbations in that they
are semi-sparse alterations that highlight objects and regions of interest
while leaving the background unaltered. As a semantically meaningful adverse
perturbations, it forms a bridge between counterfactual explanations and
adversarial perturbations in the space of images. We evaluate our approach on
several standard explainability benchmarks, namely, weak localization,
insertion deletion, and the pointing game demonstrating that perceptually
regularized counterfactuals are an effective explanation for image-based
classifiers.Comment: CVPR 202