Saliency maps have become one of the most widely used interpretability
techniques for convolutional neural networks (CNN) due to their simplicity and
the quality of the insights they provide. However, there are still some doubts
about whether these insights are a trustworthy representation of what CNNs use
to come up with their predictions. This paper explores how rescuing the sign of
the gradients from the saliency map can lead to a deeper understanding of
multi-class classification problems. Using both pretrained and trained from
scratch CNNs we unveil that considering the sign and the effect not only of the
correct class, but also the influence of the other classes, allows to better
identify the pixels of the image that the network is really focusing on.
Furthermore, how occluding or altering those pixels is expected to affect the
outcome also becomes clearer