102 research outputs found
Neurons Activation Visualization and Information Theoretic Analysis
Understanding the inner working mechanism of deep neural networks (DNNs) is
essential and important for researchers to design and improve the performance
of DNNs. In this work, the entropy analysis is leveraged to study the neurons
activation behavior of the fully connected layers of DNNs. The entropy of the
activation patterns of each layer can provide a performance metric for the
evaluation of the network model accuracy. The study is conducted based on a
well trained network model. The activation patterns of shallow and deep layers
of the fully connected layers are analyzed by inputting the images of a single
class. It is found that for the well trained deep neural networks model, the
entropy of the neuron activation pattern is monotonically reduced with the
depth of the layers. That is, the neuron activation patterns become more and
more stable with the depth of the fully connected layers. The entropy pattern
of the fully connected layers can also provide guidelines as to how many fully
connected layers are needed to guarantee the accuracy of the model. The study
in this work provides a new perspective on the analysis of DNN, which shows
some interesting results.Comment: the paper is not so well written and need to be revise
Ablation of a Robot's Brain: Neural Networks Under a Knife
It is still not fully understood exactly how neural networks are able to
solve the complex tasks that have recently pushed AI research forward. We
present a novel method for determining how information is structured inside a
neural network. Using ablation (a neuroscience technique for cutting away parts
of a brain to determine their function), we approach several neural network
architectures from a biological perspective. Through an analysis of this
method's results, we examine important similarities between biological and
artificial neural networks to search for the implicit knowledge locked away in
the network's weights
Local Explanation Methods for Deep Neural Networks Lack Sensitivity to Parameter Values
Explaining the output of a complicated machine learning model like a deep
neural network (DNN) is a central challenge in machine learning. Several
proposed local explanation methods address this issue by identifying what
dimensions of a single input are most responsible for a DNN's output. The goal
of this work is to assess the sensitivity of local explanations to DNN
parameter values. Somewhat surprisingly, we find that DNNs with
randomly-initialized weights produce explanations that are both visually and
quantitatively similar to those produced by DNNs with learned weights. Our
conjecture is that this phenomenon occurs because these explanations are
dominated by the lower level features of a DNN, and that a DNN's architecture
provides a strong prior which significantly affects the representations learned
at these lower layers. NOTE: This work is now subsumed by our recent
manuscript, Sanity Checks for Saliency Maps (to appear NIPS 2018), where we
expand on findings and address concerns raised in Sundararajan et. al. (2018).Comment: Workshop Track International Conference on Learning Representations
(ICLR
ReLU Code Space: A Basis for Rating Network Quality Besides Accuracy
We propose a new metric space of ReLU activation codes equipped with a
truncated Hamming distance which establishes an isometry between its elements
and polyhedral bodies in the input space which have recently been shown to be
strongly related to safety, robustness, and confidence. This isometry allows
the efficient computation of adjacency relations between the polyhedral bodies.
Experiments on MNIST and CIFAR-10 indicate that information besides accuracy
might be stored in the code space.Comment: in ICLR 2020 Workshop on Neural Architecture Search (NAS 2020
Internal representation dynamics and geometry in recurrent neural networks
The efficiency of recurrent neural networks (RNNs) in dealing with sequential
data has long been established. However, unlike deep, and convolution networks
where we can attribute the recognition of a certain feature to every layer, it
is unclear what "sub-task" a single recurrent step or layer accomplishes. Our
work seeks to shed light onto how a vanilla RNN implements a simple
classification task by analysing the dynamics of the network and the geometric
properties of its hidden states. We find that early internal representations
are evocative of the real labels of the data but this information is not
directly accessible to the output layer. Furthermore the network's dynamics and
the sequence length are both critical to correct classifications even when
there is no additional task relevant information provided.Comment: Presented as a poster at MAIS 2019: the Montreal AI Symposium,
Montreal, Quebec, Canada, 201
Clustering and Recognition of Spatiotemporal Features through Interpretable Embedding of Sequence to Sequence Recurrent Neural Networks
Encoder-decoder recurrent neural network models (RNN Seq2Seq) have achieved
great success in ubiquitous areas of computation and applications. It was shown
to be successful in modeling data with both temporal and spatial dependencies
for translation or prediction tasks. In this study, we propose an embedding
approach to visualize and interpret the representation of data by these models.
Furthermore, we show that the embedding is an effective method for unsupervised
learning and can be utilized to estimate the optimality of model training. In
particular, we demonstrate that embedding space projections of the decoder
states of RNN Seq2Seq model trained on sequences prediction are organized in
clusters capturing similarities and differences in the dynamics of these
sequences. Such performance corresponds to an unsupervised clustering of any
spatio-temporal features and can be employed for time-dependent problems such
as temporal segmentation, clustering of dynamic activity, self-supervised
classification, action recognition, failure prediction, etc. We test and
demonstrate the application of the embedding methodology to time-sequences of
3D human body poses. We show that the methodology provides a high-quality
unsupervised categorization of movements
Deep Convolutional Decision Jungle for Image Classification
We propose a novel method called deep convolutional decision jungle (CDJ) and
its learning algorithm for image classification. The CDJ maintains the
structure of standard convolutional neural networks (CNNs), i.e. multiple
layers of multiple response maps fully connected. Each response map-or node-in
both the convolutional and fully-connected layers selectively respond to class
labels s.t. each data sample travels via a specific soft route of those
activated nodes. The proposed method CDJ automatically learns features, whereas
decision forests and jungles require pre-defined feature sets. Compared to
CNNs, the method embeds the benefits of using data-dependent discriminative
functions, which better handles multi-modal/heterogeneous data; further,the
method offers more diverse sparse network responses, which in turn can be used
for cost-effective learning/classification. The network is learnt by combining
conventional softmax and proposed entropy losses in each layer. The entropy
loss,as used in decision tree growing, measures the purity of data activation
according to the class label distribution. The back-propagation rule for the
proposed loss function is derived from stochastic gradient descent (SGD)
optimization of CNNs. We show that our proposed method outperforms
state-of-the-art methods on three public image classification benchmarks and
one face verification dataset. We also demonstrate the use of auxiliary data
labels, when available, which helps our method to learn more discriminative
routing and representations and leads to improved classification
Fast Dynamic Routing Based on Weighted Kernel Density Estimation
Capsules as well as dynamic routing between them are most recently proposed
structures for deep neural networks. A capsule groups data into vectors or
matrices as poses rather than conventional scalars to represent specific
properties of target instance. Besides of pose, a capsule should be attached
with a probability (often denoted as activation) for its presence. The dynamic
routing helps capsules achieve more generalization capacity with many fewer
model parameters. However, the bottleneck that prevents widespread applications
of capsule is the expense of computation during routing. To address this
problem, we generalize existing routing methods within the framework of
weighted kernel density estimation, and propose two fast routing methods with
different optimization strategies. Our methods prompt the time efficiency of
routing by nearly 40\% with negligible performance degradation. By stacking a
hybrid of convolutional layers and capsule layers, we construct a network
architecture to handle inputs at a resolution of pixels. The
proposed models achieve a parallel performance with other leading methods in
multiple benchmarks.Comment: 16 pages, 4 figures, submitted to eccv 201
Interpreting Layered Neural Networks via Hierarchical Modular Representation
Interpreting the prediction mechanism of complex models is currently one of
the most important tasks in the machine learning field, especially with layered
neural networks, which have achieved high predictive performance with various
practical data sets. To reveal the global structure of a trained neural network
in an interpretable way, a series of clustering methods have been proposed,
which decompose the units into clusters according to the similarity of their
inference roles. The main problems in these studies were that (1) we have no
prior knowledge about the optimal resolution for the decomposition, or the
appropriate number of clusters, and (2) there was no method with which to
acquire knowledge about whether the outputs of each cluster have a positive or
negative correlation with the input and output dimension values. In this paper,
to solve these problems, we propose a method for obtaining a hierarchical
modular representation of a layered neural network. The application of a
hierarchical clustering method to a trained network reveals a tree-structured
relationship among hidden layer units, based on their feature vectors defined
by their correlation with the input and output dimension values
A synthetic dataset for deep learning
In this paper, we propose a novel method for generating a synthetic dataset
obeying Gaussian distribution. Compared to the commonly used benchmark datasets
with unknown distribution, the synthetic dataset has an explicit distribution,
i.e., Gaussian distribution. Meanwhile, it has the same characteristics as the
benchmark dataset MNIST. As a result, we can easily apply Deep Neural Networks
(DNNs) on the synthetic dataset. This synthetic dataset provides a novel
experimental tool to verify the proposed theories of deep learning
- …