18 research outputs found
Convolutional Neural Networks Via Node-Varying Graph Filters
Convolutional neural networks (CNNs) are being applied to an increasing
number of problems and fields due to their superior performance in
classification and regression tasks. Since two of the key operations that CNNs
implement are convolution and pooling, this type of networks is implicitly
designed to act on data described by regular structures such as images.
Motivated by the recent interest in processing signals defined in irregular
domains, we advocate a CNN architecture that operates on signals supported on
graphs. The proposed design replaces the classical convolution not with a
node-invariant graph filter (GF), which is the natural generalization of
convolution to graph domains, but with a node-varying GF. This filter extracts
different local features without increasing the output dimension of each layer
and, as a result, bypasses the need for a pooling stage while involving only
local operations. A second contribution is to replace the node-varying GF with
a hybrid node-varying GF, which is a new type of GF introduced in this paper.
While the alternative architecture can still be run locally without requiring a
pooling stage, the number of trainable parameters is smaller and can be
rendered independent of the data dimension. Tests are run on a synthetic source
localization problem and on the 20NEWS dataset.Comment: Submitted to DSW 2018 (IEEE Data Science Workshop
Predicting Parameters in Deep Learning
We demonstrate that there is significant redundancy in the parameterization
of several deep learning models. Given only a few weight values for each
feature it is possible to accurately predict the remaining values. Moreover, we
show that not only can the parameter values be predicted, but many of them need
not be learned at all. We train several different architectures by learning
only a small number of weights and predicting the rest. In the best case we are
able to predict more than 95% of the weights of a network without any drop in
accuracy
Building high-level features using large scale unsupervised learning
We consider the problem of building high-level, class-specific feature
detectors from only unlabeled data. For example, is it possible to learn a face
detector using only unlabeled images? To answer this, we train a 9-layered
locally connected sparse autoencoder with pooling and local contrast
normalization on a large dataset of images (the model has 1 billion
connections, the dataset has 10 million 200x200 pixel images downloaded from
the Internet). We train this network using model parallelism and asynchronous
SGD on a cluster with 1,000 machines (16,000 cores) for three days. Contrary to
what appears to be a widely-held intuition, our experimental results reveal
that it is possible to train a face detector without having to label images as
containing a face or not. Control experiments show that this feature detector
is robust not only to translation but also to scaling and out-of-plane
rotation. We also find that the same network is sensitive to other high-level
concepts such as cat faces and human bodies. Starting with these learned
features, we trained our network to obtain 15.8% accuracy in recognizing 20,000
object categories from ImageNet, a leap of 70% relative improvement over the
previous state-of-the-art
Automated Pruning for Deep Neural Network Compression
In this work we present a method to improve the pruning step of the current
state-of-the-art methodology to compress neural networks. The novelty of the
proposed pruning technique is in its differentiability, which allows pruning to
be performed during the backpropagation phase of the network training. This
enables an end-to-end learning and strongly reduces the training time. The
technique is based on a family of differentiable pruning functions and a new
regularizer specifically designed to enforce pruning. The experimental results
show that the joint optimization of both the thresholds and the network weights
permits to reach a higher compression rate, reducing the number of weights of
the pruned network by a further 14% to 33% compared to the current
state-of-the-art. Furthermore, we believe that this is the first study where
the generalization capabilities in transfer learning tasks of the features
extracted by a pruned network are analyzed. To achieve this goal, we show that
the representations learned using the proposed pruning methodology maintain the
same effectiveness and generality of those learned by the corresponding
non-compressed network on a set of different recognition tasks.Comment: 8 pages, 5 figures. Published as a conference paper at ICPR 201