1,922 research outputs found
Explaining Neural Networks by Decoding Layer Activations
We present a `CLAssifier-DECoder' architecture (\emph{ClaDec}) which
facilitates the comprehension of the output of an arbitrary layer in a neural
network (NN). It uses a decoder to transform the non-interpretable
representation of the given layer to a representation that is more similar to
the domain a human is familiar with. In an image recognition problem, one can
recognize what information is represented by a layer by contrasting
reconstructed images of \emph{ClaDec} with those of a conventional
auto-encoder(AE) serving as reference. We also extend \emph{ClaDec} to allow
the trade-off between human interpretability and fidelity. We evaluate our
approach for image classification using Convolutional NNs. We show that
reconstructed visualizations using encodings from a classifier capture more
relevant information for classification than conventional AEs. Relevant code is
available at \url{https://github.com/JohnTailor/ClaDec
Methods for Interpreting and Understanding Deep Neural Networks
This paper provides an entry point to the problem of interpreting a deep
neural network model and explaining its predictions. It is based on a tutorial
given at ICASSP 2017. It introduces some recently proposed techniques of
interpretation, along with theory, tricks and recommendations, to make most
efficient use of these techniques on real data. It also discusses a number of
practical applications.Comment: 14 pages, 10 figure
Discriminative Recurrent Sparse Auto-Encoders
We present the discriminative recurrent sparse auto-encoder model, comprising
a recurrent encoder of rectified linear units, unrolled for a fixed number of
iterations, and connected to two linear decoders that reconstruct the input and
predict its supervised classification. Training via
backpropagation-through-time initially minimizes an unsupervised sparse
reconstruction error; the loss function is then augmented with a discriminative
term on the supervised classification. The depth implicit in the
temporally-unrolled form allows the system to exhibit all the power of deep
networks, while substantially reducing the number of trainable parameters.
From an initially unstructured network the hidden units differentiate into
categorical-units, each of which represents an input prototype with a
well-defined class; and part-units representing deformations of these
prototypes. The learned organization of the recurrent encoder is hierarchical:
part-units are driven directly by the input, whereas the activity of
categorical-units builds up over time through interactions with the part-units.
Even using a small number of hidden units per layer, discriminative recurrent
sparse auto-encoders achieve excellent performance on MNIST.Comment: Added clarifications suggested by reviewers. 15 pages, 10 figure
Interpretable Convolutional Neural Networks
This paper proposes a method to modify traditional convolutional neural
networks (CNNs) into interpretable CNNs, in order to clarify knowledge
representations in high conv-layers of CNNs. In an interpretable CNN, each
filter in a high conv-layer represents a certain object part. We do not need
any annotations of object parts or textures to supervise the learning process.
Instead, the interpretable CNN automatically assigns each filter in a high
conv-layer with an object part during the learning process. Our method can be
applied to different types of CNNs with different structures. The clear
knowledge representation in an interpretable CNN can help people understand the
logics inside a CNN, i.e., based on which patterns the CNN makes the decision.
Experiments showed that filters in an interpretable CNN were more semantically
meaningful than those in traditional CNNs.Comment: In this version, we release the website of the code. Compared to the
previous version, we have corrected all values of location instability in
Table 3--6 by dividing the values by sqrt(2), i.e., a=a/sqrt(2). Such
revisions do NOT decrease the significance of the superior performance of our
method, because we make the same correction to location-instability values of
all baseline
- …