578 research outputs found
On Saliency Maps and Adversarial Robustness
A Very recent trend has emerged to couple the notion of interpretability and
adversarial robustness, unlike earlier efforts which solely focused on good
interpretations or robustness against adversaries. Works have shown that
adversarially trained models exhibit more interpretable saliency maps than
their non-robust counterparts, and that this behavior can be quantified by
considering the alignment between input image and saliency map. In this work,
we provide a different perspective to this coupling, and provide a method,
Saliency based Adversarial training (SAT), to use saliency maps to improve
adversarial robustness of a model. In particular, we show that using
annotations such as bounding boxes and segmentation masks, already provided
with a dataset, as weak saliency maps, suffices to improve adversarial
robustness with no additional effort to generate the perturbations themselves.
Our empirical results on CIFAR-10, CIFAR-100, Tiny ImageNet and Flower-17
datasets consistently corroborate our claim, by showing improved adversarial
robustness using our method. saliency maps. We also show how using finer and
stronger saliency maps leads to more robust models, and how integrating SAT
with existing adversarial training methods, further boosts performance of these
existing methods.Comment: Accepted at ECML-PKDD 2020, Acknowledgements adde
Robust Explainability: A Tutorial on Gradient-Based Attribution Methods for Deep Neural Networks
With the rise of deep neural networks, the challenge of explaining the predictions of these networks has become increasingly recognized. While many methods for explaining the decisions of deep neural networks exist, there is currently no consensus on how to evaluate them. On the other hand, robustness is a popular topic for deep learning research; however, it is hardly talked about in explainability until very recently. In this tutorial paper, we start by presenting gradient-based interpretability methods. These techniques use gradient signals to assign the burden of the decision on the input features. Later, we discuss how gradient-based methods can be evaluated for their robustness and the role that adversarial robustness plays in having meaningful explanations. We also discuss the limitations of gradient-based methods. Finally, we present the best practices and attributes that should be examined before choosing an explainability method. We conclude with the future directions for research in the area at the convergence of robustness and explainability
Robust Explainability: A Tutorial on Gradient-Based Attribution Methods for Deep Neural Networks
With the rise of deep neural networks, the challenge of explaining the
predictions of these networks has become increasingly recognized. While many
methods for explaining the decisions of deep neural networks exist, there is
currently no consensus on how to evaluate them. On the other hand, robustness
is a popular topic for deep learning research; however, it is hardly talked
about in explainability until very recently. In this tutorial paper, we start
by presenting gradient-based interpretability methods. These techniques use
gradient signals to assign the burden of the decision on the input features.
Later, we discuss how gradient-based methods can be evaluated for their
robustness and the role that adversarial robustness plays in having meaningful
explanations. We also discuss the limitations of gradient-based methods.
Finally, we present the best practices and attributes that should be examined
before choosing an explainability method. We conclude with the future
directions for research in the area at the convergence of robustness and
explainability.Comment: 23 pages, 4 figure
On the Robustness of Explanations of Deep Neural Network Models: A Survey
Explainability has been widely stated as a cornerstone of the responsible and
trustworthy use of machine learning models. With the ubiquitous use of Deep
Neural Network (DNN) models expanding to risk-sensitive and safety-critical
domains, many methods have been proposed to explain the decisions of these
models. Recent years have also seen concerted efforts that have shown how such
explanations can be distorted (attacked) by minor input perturbations. While
there have been many surveys that review explainability methods themselves,
there has been no effort hitherto to assimilate the different methods and
metrics proposed to study the robustness of explanations of DNN models. In this
work, we present a comprehensive survey of methods that study, understand,
attack, and defend explanations of DNN models. We also present a detailed
review of different metrics used to evaluate explanation methods, as well as
describe attributional attack and defense methods. We conclude with lessons and
take-aways for the community towards ensuring robust explanations of DNN model
predictions.Comment: Under Review ACM Computing Surveys "Special Issue on Trustworthy AI
- …