381 research outputs found
Unmasking Clever Hans Predictors and Assessing What Machines Really Learn
Current learning machines have successfully solved hard application problems,
reaching high accuracy and displaying seemingly "intelligent" behavior. Here we
apply recent techniques for explaining decisions of state-of-the-art learning
machines and analyze various tasks from computer vision and arcade games. This
showcases a spectrum of problem-solving behaviors ranging from naive and
short-sighted, to well-informed and strategic. We observe that standard
performance evaluation metrics can be oblivious to distinguishing these diverse
problem solving behaviors. Furthermore, we propose our semi-automated Spectral
Relevance Analysis that provides a practically effective way of characterizing
and validating the behavior of nonlinear learning machines. This helps to
assess whether a learned model indeed delivers reliably for the problem that it
was conceived for. Furthermore, our work intends to add a voice of caution to
the ongoing excitement about machine intelligence and pledges to evaluate and
judge some of these recent successes in a more nuanced manner.Comment: Accepted for publication in Nature Communication
Towards better understanding of gradient-based attribution methods for Deep Neural Networks
Understanding the flow of information in Deep Neural Networks (DNNs) is a
challenging problem that has gain increasing attention over the last few years.
While several methods have been proposed to explain network predictions, there
have been only a few attempts to compare them from a theoretical perspective.
What is more, no exhaustive empirical comparison has been performed in the
past. In this work, we analyze four gradient-based attribution methods and
formally prove conditions of equivalence and approximation between them. By
reformulating two of these methods, we construct a unified framework which
enables a direct comparison, as well as an easier implementation. Finally, we
propose a novel evaluation metric, called Sensitivity-n and test the
gradient-based attribution methods alongside with a simple perturbation-based
attribution method on several datasets in the domains of image and text
classification, using various network architectures.Comment: ICLR 201
- …