11 research outputs found
The Adversarial Attack and Detection under the Fisher Information Metric
Many deep learning models are vulnerable to the adversarial attack, i.e.,
imperceptible but intentionally-designed perturbations to the input can cause
incorrect output of the networks. In this paper, using information geometry, we
provide a reasonable explanation for the vulnerability of deep learning models.
By considering the data space as a non-linear space with the Fisher information
metric induced from a neural network, we first propose an adversarial attack
algorithm termed one-step spectral attack (OSSA). The method is described by a
constrained quadratic form of the Fisher information matrix, where the optimal
adversarial perturbation is given by the first eigenvector, and the model
vulnerability is reflected by the eigenvalues. The larger an eigenvalue is, the
more vulnerable the model is to be attacked by the corresponding eigenvector.
Taking advantage of the property, we also propose an adversarial detection
method with the eigenvalues serving as characteristics. Both our attack and
detection algorithms are numerically optimized to work efficiently on large
datasets. Our evaluations show superior performance compared with other
methods, implying that the Fisher information is a promising approach to
investigate the adversarial attacks and defenses.Comment: Accepted as an AAAI-2019 oral pape
Explaining Classifiers using Adversarial Perturbations on the Perceptual Ball
We present a simple regularization of adversarial perturbations based upon
the perceptual loss. While the resulting perturbations remain imperceptible to
the human eye, they differ from existing adversarial perturbations in that they
are semi-sparse alterations that highlight objects and regions of interest
while leaving the background unaltered. As a semantically meaningful adverse
perturbations, it forms a bridge between counterfactual explanations and
adversarial perturbations in the space of images. We evaluate our approach on
several standard explainability benchmarks, namely, weak localization,
insertion deletion, and the pointing game demonstrating that perceptually
regularized counterfactuals are an effective explanation for image-based
classifiers.Comment: CVPR 202