16,002 research outputs found
Humans and deep networks largely agree on which kinds of variation make object recognition harder
View-invariant object recognition is a challenging problem, which has
attracted much attention among the psychology, neuroscience, and computer
vision communities. Humans are notoriously good at it, even if some variations
are presumably more difficult to handle than others (e.g. 3D rotations). Humans
are thought to solve the problem through hierarchical processing along the
ventral stream, which progressively extracts more and more invariant visual
features. This feed-forward architecture has inspired a new generation of
bio-inspired computer vision systems called deep convolutional neural networks
(DCNN), which are currently the best algorithms for object recognition in
natural images. Here, for the first time, we systematically compared human
feed-forward vision and DCNNs at view-invariant object recognition using the
same images and controlling for both the kinds of transformation as well as
their magnitude. We used four object categories and images were rendered from
3D computer models. In total, 89 human subjects participated in 10 experiments
in which they had to discriminate between two or four categories after rapid
presentation with backward masking. We also tested two recent DCNNs on the same
tasks. We found that humans and DCNNs largely agreed on the relative
difficulties of each kind of variation: rotation in depth is by far the hardest
transformation to handle, followed by scale, then rotation in plane, and
finally position. This suggests that humans recognize objects mainly through 2D
template matching, rather than by constructing 3D object models, and that DCNNs
are not too unreasonable models of human feed-forward vision. Also, our results
show that the variation levels in rotation in depth and scale strongly modulate
both humans' and DCNNs' recognition performances. We thus argue that these
variations should be controlled in the image datasets used in vision research
Character-level Convolutional Networks for Text Classification
This article offers an empirical exploration on the use of character-level
convolutional networks (ConvNets) for text classification. We constructed
several large-scale datasets to show that character-level convolutional
networks could achieve state-of-the-art or competitive results. Comparisons are
offered against traditional models such as bag of words, n-grams and their
TFIDF variants, and deep learning models such as word-based ConvNets and
recurrent neural networks.Comment: An early version of this work entitled "Text Understanding from
Scratch" was posted in Feb 2015 as arXiv:1502.01710. The present paper has
considerably more experimental results and a rewritten introduction, Advances
in Neural Information Processing Systems 28 (NIPS 2015
Evolution and Analysis of Embodied Spiking Neural Networks Reveals Task-Specific Clusters of Effective Networks
Elucidating principles that underlie computation in neural networks is
currently a major research topic of interest in neuroscience. Transfer Entropy
(TE) is increasingly used as a tool to bridge the gap between network
structure, function, and behavior in fMRI studies. Computational models allow
us to bridge the gap even further by directly associating individual neuron
activity with behavior. However, most computational models that have analyzed
embodied behaviors have employed non-spiking neurons. On the other hand,
computational models that employ spiking neural networks tend to be restricted
to disembodied tasks. We show for the first time the artificial evolution and
TE-analysis of embodied spiking neural networks to perform a
cognitively-interesting behavior. Specifically, we evolved an agent controlled
by an Izhikevich neural network to perform a visual categorization task. The
smallest networks capable of performing the task were found by repeating
evolutionary runs with different network sizes. Informational analysis of the
best solution revealed task-specific TE-network clusters, suggesting that
within-task homogeneity and across-task heterogeneity were key to behavioral
success. Moreover, analysis of the ensemble of solutions revealed that
task-specificity of TE-network clusters correlated with fitness. This provides
an empirically testable hypothesis that links network structure to behavior.Comment: Camera ready version of accepted for GECCO'1
Graded, Dynamically Routable Information Processing with Synfire-Gated Synfire Chains
Coherent neural spiking and local field potentials are believed to be
signatures of the binding and transfer of information in the brain. Coherent
activity has now been measured experimentally in many regions of mammalian
cortex. Synfire chains are one of the main theoretical constructs that have
been appealed to to describe coherent spiking phenomena. However, for some
time, it has been known that synchronous activity in feedforward networks
asymptotically either approaches an attractor with fixed waveform and
amplitude, or fails to propagate. This has limited their ability to explain
graded neuronal responses. Recently, we have shown that pulse-gated synfire
chains are capable of propagating graded information coded in mean population
current or firing rate amplitudes. In particular, we showed that it is possible
to use one synfire chain to provide gating pulses and a second, pulse-gated
synfire chain to propagate graded information. We called these circuits
synfire-gated synfire chains (SGSCs). Here, we present SGSCs in which graded
information can rapidly cascade through a neural circuit, and show a
correspondence between this type of transfer and a mean-field model in which
gating pulses overlap in time. We show that SGSCs are robust in the presence of
variability in population size, pulse timing and synaptic strength. Finally, we
demonstrate the computational capabilities of SGSC-based information coding by
implementing a self-contained, spike-based, modular neural circuit that is
triggered by, then reads in streaming input, processes the input, then makes a
decision based on the processed information and shuts itself down
Rapid Visual Categorization is not Guided by Early Salience-Based Selection
The current dominant visual processing paradigm in both human and machine
research is the feedforward, layered hierarchy of neural-like processing
elements. Within this paradigm, visual saliency is seen by many to have a
specific role, namely that of early selection. Early selection is thought to
enable very fast visual performance by limiting processing to only the most
salient candidate portions of an image. This strategy has led to a plethora of
saliency algorithms that have indeed improved processing time efficiency in
machine algorithms, which in turn have strengthened the suggestion that human
vision also employs a similar early selection strategy. However, at least one
set of critical tests of this idea has never been performed with respect to the
role of early selection in human vision. How would the best of the current
saliency models perform on the stimuli used by experimentalists who first
provided evidence for this visual processing paradigm? Would the algorithms
really provide correct candidate sub-images to enable fast categorization on
those same images? Do humans really need this early selection for their
impressive performance? Here, we report on a new series of tests of these
questions whose results suggest that it is quite unlikely that such an early
selection process has any role in human rapid visual categorization.Comment: 22 pages, 9 figure
Reverberating activity in a neural network with distributed signal transmission delays
It is known that an identical delay in all transmission lines can destabilize
macroscopic stationarity of a neural network, causing oscillation or chaos. We
analyze the collective dynamics of a network whose intra-transmission delays
are distributed in time. Here, a neuron is modeled as a discrete-time threshold
element that responds in an all-or-nothing manner to a linear sum of signals
that arrive after delays assigned to individual transmission lines. Even though
transmission delays are distributed in time, a whole network exhibits a single
collective oscillation with a period close to the average transmission delay.
The collective oscillation can not only be a simple alternation of the
consecutive firing and resting, but also nontrivially sequenced series of
firing and resting, reverberating in a certain period of time. Moreover, the
system dynamics can be made quasiperiodic or chaotic by changing the
distribution of delays.Comment: 8pages, 9figure
ART-EMAP: A Neural Network Architecture for Learning and Prediction by Evidence Accumulation
This paper introduces ART-EMAP, a neural architecture that uses spatial and temporal evidence accumulation to extend the capabilities of fuzzy ARTMAP. ART-EMAP combines supervised and unsupervised learning and a medium-term memory process to accomplish stable pattern category recognition in a noisy input environment. The ART-EMAP system features (i) distributed pattern registration at a view category field; (ii) a decision criterion for mapping between view and object categories which can delay categorization of ambiguous objects and trigger an evidence accumulation process when faced with a low confidence prediction; (iii) a process that accumulates evidence at a medium-term memory (MTM) field; and (iv) an unsupervised learning algorithm to fine-tune performance after a limited initial period of supervised network training. ART-EMAP dynamics are illustrated with a benchmark simulation example. Applications include 3-D object recognition from a series of ambiguous 2-D views.British Petroleum (89-A-1204); Defense Advanced Research Projects Agency (AFOSR-90-0083, ONR-N00014-92-J-4015); National Science Foundation (IRI-90-00530); Office of Naval Research (N00014-91-J-4100); Air Force Office of Scientific Research (90-0083
Interpretable Categorization of Heterogeneous Time Series Data
Understanding heterogeneous multivariate time series data is important in
many applications ranging from smart homes to aviation. Learning models of
heterogeneous multivariate time series that are also human-interpretable is
challenging and not adequately addressed by the existing literature. We propose
grammar-based decision trees (GBDTs) and an algorithm for learning them. GBDTs
extend decision trees with a grammar framework. Logical expressions derived
from a context-free grammar are used for branching in place of simple
thresholds on attributes. The added expressivity enables support for a wide
range of data types while retaining the interpretability of decision trees. In
particular, when a grammar based on temporal logic is used, we show that GBDTs
can be used for the interpretable classi cation of high-dimensional and
heterogeneous time series data. Furthermore, we show how GBDTs can also be used
for categorization, which is a combination of clustering and generating
interpretable explanations for each cluster. We apply GBDTs to analyze the
classic Australian Sign Language dataset as well as data on near mid-air
collisions (NMACs). The NMAC data comes from aircraft simulations used in the
development of the next-generation Airborne Collision Avoidance System (ACAS
X).Comment: 9 pages, 5 figures, 2 tables, SIAM International Conference on Data
Mining (SDM) 201
- …