1 research outputs found
Explaining Deep Learning Hidden Neuron Activations using Concept Induction
One of the current key challenges in Explainable AI is in correctly
interpreting activations of hidden neurons. It seems evident that accurate
interpretations thereof would provide insights into the question what a deep
learning system has internally \emph{detected} as relevant on the input, thus
lifting some of the black box character of deep learning systems.
The state of the art on this front indicates that hidden node activations
appear to be interpretable in a way that makes sense to humans, at least in
some cases. Yet, systematic automated methods that would be able to first
hypothesize an interpretation of hidden neuron activations, and then verify it,
are mostly missing.
In this paper, we provide such a method and demonstrate that it provides
meaningful interpretations. It is based on using large-scale background
knowledge -- a class hierarchy of approx. 2 million classes curated from the
Wikipedia Concept Hierarchy -- together with a symbolic reasoning approach
called \emph{concept induction} based on description logics that was originally
developed for applications in the Semantic Web field.
Our results show that we can automatically attach meaningful labels from the
background knowledge to individual neurons in the dense layer of a
Convolutional Neural Network through a hypothesis and verification process.Comment: Submitted to IJCAI-2