11,360 research outputs found
Sparse Coding on Stereo Video for Object Detection
Deep Convolutional Neural Networks (DCNN) require millions of labeled
training examples for image classification and object detection tasks, which
restrict these models to domains where such datasets are available. In this
paper, we explore the use of unsupervised sparse coding applied to stereo-video
data to help alleviate the need for large amounts of labeled data. We show that
replacing a typical supervised convolutional layer with an unsupervised
sparse-coding layer within a DCNN allows for better performance on a car
detection task when only a limited number of labeled training examples is
available. Furthermore, the network that incorporates sparse coding allows for
more consistent performance over varying initializations and ordering of
training examples when compared to a fully supervised DCNN. Finally, we compare
activations between the unsupervised sparse-coding layer and the supervised
convolutional layer, and show that the sparse representation exhibits an
encoding that is depth selective, whereas encodings from the convolutional
layer do not exhibit such selectivity. These result indicates promise for using
unsupervised sparse-coding approaches in real-world computer vision tasks in
domains with limited labeled training data
Compositional Model based Fisher Vector Coding for Image Classification
Deriving from the gradient vector of a generative model of local features,
Fisher vector coding (FVC) has been identified as an effective coding method
for image classification. Most, if not all, FVC implementations employ the
Gaussian mixture model (GMM) to depict the generation process of local
features. However, the representative power of the GMM could be limited because
it essentially assumes that local features can be characterized by a fixed
number of feature prototypes and the number of prototypes is usually small in
FVC. To handle this limitation, in this paper we break the convention which
assumes that a local feature is drawn from one of few Gaussian distributions.
Instead, we adopt a compositional mechanism which assumes that a local feature
is drawn from a Gaussian distribution whose mean vector is composed as the
linear combination of multiple key components and the combination weight is a
latent random variable. In this way, we can greatly enhance the representative
power of the generative model of FVC. To implement our idea, we designed two
particular generative models with such a compositional mechanism.Comment: Fixed typos. 16 pages. Appearing in IEEE T. Pattern Analysis and
Machine Intelligence (TPAMI
Convolutional Sparse Kernel Network for Unsupervised Medical Image Analysis
The availability of large-scale annotated image datasets and recent advances
in supervised deep learning methods enable the end-to-end derivation of
representative image features that can impact a variety of image analysis
problems. Such supervised approaches, however, are difficult to implement in
the medical domain where large volumes of labelled data are difficult to obtain
due to the complexity of manual annotation and inter- and intra-observer
variability in label assignment. We propose a new convolutional sparse kernel
network (CSKN), which is a hierarchical unsupervised feature learning framework
that addresses the challenge of learning representative visual features in
medical image analysis domains where there is a lack of annotated training
data. Our framework has three contributions: (i) We extend kernel learning to
identify and represent invariant features across image sub-patches in an
unsupervised manner. (ii) We initialise our kernel learning with a layer-wise
pre-training scheme that leverages the sparsity inherent in medical images to
extract initial discriminative features. (iii) We adapt a multi-scale spatial
pyramid pooling (SPP) framework to capture subtle geometric differences between
learned visual features. We evaluated our framework in medical image retrieval
and classification on three public datasets. Our results show that our CSKN had
better accuracy when compared to other conventional unsupervised methods and
comparable accuracy to methods that used state-of-the-art supervised
convolutional neural networks (CNNs). Our findings indicate that our
unsupervised CSKN provides an opportunity to leverage unannotated big data in
medical imaging repositories.Comment: Accepted by Medical Image Analysis (with a new title 'Convolutional
Sparse Kernel Network for Unsupervised Medical Image Analysis'). The
manuscript is available from following link
(https://doi.org/10.1016/j.media.2019.06.005
Biologically plausible deep learning -- but how far can we go with shallow networks?
Training deep neural networks with the error backpropagation algorithm is
considered implausible from a biological perspective. Numerous recent
publications suggest elaborate models for biologically plausible variants of
deep learning, typically defining success as reaching around 98% test accuracy
on the MNIST data set. Here, we investigate how far we can go on digit (MNIST)
and object (CIFAR10) classification with biologically plausible, local learning
rules in a network with one hidden layer and a single readout layer. The hidden
layer weights are either fixed (random or random Gabor filters) or trained with
unsupervised methods (PCA, ICA or Sparse Coding) that can be implemented by
local learning rules. The readout layer is trained with a supervised, local
learning rule. We first implement these models with rate neurons. This
comparison reveals, first, that unsupervised learning does not lead to better
performance than fixed random projections or Gabor filters for large hidden
layers. Second, networks with localized receptive fields perform significantly
better than networks with all-to-all connectivity and can reach backpropagation
performance on MNIST. We then implement two of the networks - fixed, localized,
random & random Gabor filters in the hidden layer - with spiking leaky
integrate-and-fire neurons and spike timing dependent plasticity to train the
readout layer. These spiking models achieve > 98.2% test accuracy on MNIST,
which is close to the performance of rate networks with one hidden layer
trained with backpropagation. The performance of our shallow network models is
comparable to most current biologically plausible models of deep learning.
Furthermore, our results with a shallow spiking network provide an important
reference and suggest the use of datasets other than MNIST for testing the
performance of future models of biologically plausible deep learning.Comment: 14 pages, 4 figure
- …