14 research outputs found
Bio-Inspired Multi-Layer Spiking Neural Network Extracts Discriminative Features from Speech Signals
Spiking neural networks (SNNs) enable power-efficient implementations due to
their sparse, spike-based coding scheme. This paper develops a bio-inspired SNN
that uses unsupervised learning to extract discriminative features from speech
signals, which can subsequently be used in a classifier. The architecture
consists of a spiking convolutional/pooling layer followed by a fully connected
spiking layer for feature discovery. The convolutional layer of leaky,
integrate-and-fire (LIF) neurons represents primary acoustic features. The
fully connected layer is equipped with a probabilistic spike-timing-dependent
plasticity learning rule. This layer represents the discriminative features
through probabilistic, LIF neurons. To assess the discriminative power of the
learned features, they are used in a hidden Markov model (HMM) for spoken digit
recognition. The experimental results show performance above 96% that compares
favorably with popular statistical feature extraction methods. Our results
provide a novel demonstration of unsupervised feature acquisition in an SNN
A noise based novel strategy for faster SNN training
Spiking neural networks (SNNs) are receiving increasing attention due to
their low power consumption and strong bio-plausibility. Optimization of SNNs
is a challenging task. Two main methods, artificial neural network (ANN)-to-SNN
conversion and spike-based backpropagation (BP), both have their advantages and
limitations. For ANN-to-SNN conversion, it requires a long inference time to
approximate the accuracy of ANN, thus diminishing the benefits of SNN. With
spike-based BP, training high-precision SNNs typically consumes dozens of times
more computational resources and time than their ANN counterparts. In this
paper, we propose a novel SNN training approach that combines the benefits of
the two methods. We first train a single-step SNN(T=1) by approximating the
neural potential distribution with random noise, then convert the single-step
SNN(T=1) to a multi-step SNN(T=N) losslessly. The introduction of Gaussian
distributed noise leads to a significant gain in accuracy after conversion. The
results show that our method considerably reduces the training and inference
times of SNNs while maintaining their high accuracy. Compared to the previous
two methods, ours can reduce training time by 65%-75% and achieves more than
100 times faster inference speed. We also argue that the neuron model augmented
with noise makes it more bio-plausible
Learning to Recognize Actions from Limited Training Examples Using a Recurrent Spiking Neural Model
A fundamental challenge in machine learning today is to build a model that
can learn from few examples. Here, we describe a reservoir based spiking neural
model for learning to recognize actions with a limited number of labeled
videos. First, we propose a novel encoding, inspired by how microsaccades
influence visual perception, to extract spike information from raw video data
while preserving the temporal correlation across different frames. Using this
encoding, we show that the reservoir generalizes its rich dynamical activity
toward signature action/movements enabling it to learn from few training
examples. We evaluate our approach on the UCF-101 dataset. Our experiments
demonstrate that our proposed reservoir achieves 81.3%/87% Top-1/Top-5
accuracy, respectively, on the 101-class data while requiring just 8 video
examples per class for training. Our results establish a new benchmark for
action recognition from limited video examples for spiking neural models while
yielding competetive accuracy with respect to state-of-the-art non-spiking
neural models.Comment: 13 figures (includes supplementary information
Humans and deep networks largely agree on which kinds of variation make object recognition harder
View-invariant object recognition is a challenging problem, which has
attracted much attention among the psychology, neuroscience, and computer
vision communities. Humans are notoriously good at it, even if some variations
are presumably more difficult to handle than others (e.g. 3D rotations). Humans
are thought to solve the problem through hierarchical processing along the
ventral stream, which progressively extracts more and more invariant visual
features. This feed-forward architecture has inspired a new generation of
bio-inspired computer vision systems called deep convolutional neural networks
(DCNN), which are currently the best algorithms for object recognition in
natural images. Here, for the first time, we systematically compared human
feed-forward vision and DCNNs at view-invariant object recognition using the
same images and controlling for both the kinds of transformation as well as
their magnitude. We used four object categories and images were rendered from
3D computer models. In total, 89 human subjects participated in 10 experiments
in which they had to discriminate between two or four categories after rapid
presentation with backward masking. We also tested two recent DCNNs on the same
tasks. We found that humans and DCNNs largely agreed on the relative
difficulties of each kind of variation: rotation in depth is by far the hardest
transformation to handle, followed by scale, then rotation in plane, and
finally position. This suggests that humans recognize objects mainly through 2D
template matching, rather than by constructing 3D object models, and that DCNNs
are not too unreasonable models of human feed-forward vision. Also, our results
show that the variation levels in rotation in depth and scale strongly modulate
both humans' and DCNNs' recognition performances. We thus argue that these
variations should be controlled in the image datasets used in vision research
Bio-inspired unsupervised learning of visual features leads to robust invariant object recognition
Retinal image of surrounding objects varies tremendously due to the changes
in position, size, pose, illumination condition, background context, occlusion,
noise, and nonrigid deformations. But despite these huge variations, our visual
system is able to invariantly recognize any object in just a fraction of a
second. To date, various computational models have been proposed to mimic the
hierarchical processing of the ventral visual pathway, with limited success.
Here, we show that the association of both biologically inspired network
architecture and learning rule significantly improves the models' performance
when facing challenging invariant object recognition problems. Our model is an
asynchronous feedforward spiking neural network. When the network is presented
with natural images, the neurons in the entry layers detect edges, and the most
activated ones fire first, while neurons in higher layers are equipped with
spike timing-dependent plasticity. These neurons progressively become selective
to intermediate complexity visual features appropriate for object
categorization. The model is evaluated on 3D-Object and ETH-80 datasets which
are two benchmarks for invariant object recognition, and is shown to outperform
state-of-the-art models, including DeepConvNet and HMAX. This demonstrates its
ability to accurately recognize different instances of multiple object classes
even under various appearance conditions (different views, scales, tilts, and
backgrounds). Several statistical analysis techniques are used to show that our
model extracts class specific and highly informative features