2 research outputs found
On the Robustness of Explanations of Deep Neural Network Models: A Survey
Explainability has been widely stated as a cornerstone of the responsible and
trustworthy use of machine learning models. With the ubiquitous use of Deep
Neural Network (DNN) models expanding to risk-sensitive and safety-critical
domains, many methods have been proposed to explain the decisions of these
models. Recent years have also seen concerted efforts that have shown how such
explanations can be distorted (attacked) by minor input perturbations. While
there have been many surveys that review explainability methods themselves,
there has been no effort hitherto to assimilate the different methods and
metrics proposed to study the robustness of explanations of DNN models. In this
work, we present a comprehensive survey of methods that study, understand,
attack, and defend explanations of DNN models. We also present a detailed
review of different metrics used to evaluate explanation methods, as well as
describe attributional attack and defense methods. We conclude with lessons and
take-aways for the community towards ensuring robust explanations of DNN model
predictions.Comment: Under Review ACM Computing Surveys "Special Issue on Trustworthy AI