1 research outputs found
Assessing the Reliability of Visual Explanations of Deep Models with Adversarial Perturbations
The interest in complex deep neural networks for computer vision applications
is increasing. This leads to the need for improving the interpretable
capabilities of these models. Recent explanation methods present visualizations
of the relevance of pixels from input images, thus enabling the direct
interpretation of properties of the input that lead to a specific output. These
methods produce maps of pixel importance, which are commonly evaluated by
visual inspection. This means that the effectiveness of an explanation method
is assessed based on human expectation instead of actual feature importance.
Thus, in this work we propose an objective measure to evaluate the reliability
of explanations of deep models. Specifically, our approach is based on changes
in the network's outcome resulting from the perturbation of input images in an
adversarial way. We present a comparison between widely-known explanation
methods using our proposed approach. Finally, we also propose a straightforward
application of our approach to clean relevance maps, creating more
interpretable maps without any loss in essential explanation (as per our
proposed measure).Comment: Accepted for publication at IJCNN 202