222 research outputs found
The Nature of Illusory Contour Computation
AbstractNeural correlates of illusory contour perception have been found in both the early and the higher visual areas. But the locus and the mechanism for its computation remain elusive. Psychophysical evidence provided in this issue of Neuron shows that perceptual contour completion is likely done in the early visual cortex in a cascade manner using horizontal connections
Learning Robust Object Recognition Using Composed Scenes from Generative Models
Recurrent feedback connections in the mammalian visual system have been
hypothesized to play a role in synthesizing input in the theoretical framework
of analysis by synthesis. The comparison of internally synthesized
representation with that of the input provides a validation mechanism during
perceptual inference and learning. Inspired by these ideas, we proposed that
the synthesis machinery can compose new, unobserved images by imagination to
train the network itself so as to increase the robustness of the system in
novel scenarios. As a proof of concept, we investigated whether images composed
by imagination could help an object recognition system to deal with occlusion,
which is challenging for the current state-of-the-art deep convolutional neural
networks. We fine-tuned a network on images containing objects in various
occlusion scenarios, that are imagined or self-generated through a deep
generator network. Trained on imagined occluded scenarios under the object
persistence constraint, our network discovered more subtle and localized image
features that were neglected by the original network for object classification,
obtaining better separability of different object classes in the feature space.
This leads to significant improvement of object recognition under occlusion for
our network relative to the original network trained only on un-occluded
images. In addition to providing practical benefits in object recognition under
occlusion, this work demonstrates the use of self-generated composition of
visual scenes through the synthesis loop, combined with the object persistence
constraint, can provide opportunities for neural networks to discover new
relevant patterns in the data, and become more flexible in dealing with novel
situations.Comment: Accepted by 14th Conference on Computer and Robot Visio
Predictive Encoding of Contextual Relationships for Perceptual Inference, Interpolation and Prediction
We propose a new neurally-inspired model that can learn to encode the global
relationship context of visual events across time and space and to use the
contextual information to modulate the analysis by synthesis process in a
predictive coding framework. The model learns latent contextual representations
by maximizing the predictability of visual events based on local and global
contextual information through both top-down and bottom-up processes. In
contrast to standard predictive coding models, the prediction error in this
model is used to update the contextual representation but does not alter the
feedforward input for the next layer, and is thus more consistent with
neurophysiological observations. We establish the computational feasibility of
this model by demonstrating its ability in several aspects. We show that our
model can outperform state-of-art performances of gated Boltzmann machines
(GBM) in estimation of contextual information. Our model can also interpolate
missing events or predict future events in image sequences while simultaneously
estimating contextual information. We show it achieves state-of-art
performances in terms of prediction accuracy in a variety of tasks and
possesses the ability to interpolate missing frames, a function that is lacking
in GBM
Learning to Associate Words and Images Using a Large-scale Graph
We develop an approach for unsupervised learning of associations between
co-occurring perceptual events using a large graph. We applied this approach to
successfully solve the image captcha of China's railroad system. The approach
is based on the principle of suspicious coincidence. In this particular
problem, a user is presented with a deformed picture of a Chinese phrase and
eight low-resolution images. They must quickly select the relevant images in
order to purchase their train tickets. This problem presents several
challenges: (1) the teaching labels for both the Chinese phrases and the images
were not available for supervised learning, (2) no pre-trained deep
convolutional neural networks are available for recognizing these Chinese
phrases or the presented images, and (3) each captcha must be solved within a
few seconds. We collected 2.6 million captchas, with 2.6 million deformed
Chinese phrases and over 21 million images. From these data, we constructed an
association graph, composed of over 6 million vertices, and linked these
vertices based on co-occurrence information and feature similarity between
pairs of images. We then trained a deep convolutional neural network to learn a
projection of the Chinese phrases onto a 230-dimensional latent space. Using
label propagation, we computed the likelihood of each of the eight images
conditioned on the latent space projection of the deformed phrase for each
captcha. The resulting system solved captchas with 77% accuracy in 2 seconds on
average. Our work, in answering this practical challenge, illustrates the power
of this class of unsupervised association learning techniques, which may be
related to the brain's general strategy for associating language stimuli with
visual objects on the principle of suspicious coincidence.Comment: 8 pages, 7 figures, 14th Conference on Computer and Robot Vision 201
Recommended from our members
Hierarchical Bayesian Inference in the Visual Cortex
Traditional views of visual processing suggest that early visual neurons in areas V1 and V2 are static spatiotemporal filters that extract local features from a visual scene. The extracted information is then channeled through a feedforward chain of modules in successively higher visual areas for further analysis. Recent electrophysiological recordings from early visual neurons in awake behaving monkeys reveal that there are many levels of complexity in the information processing of the early visual cortex, as seen in the long-latency responses of its neurons. These new findings suggest that activity in the early visual cortex is tightly coupled and highly interactive with the rest of the visual system. They lead us to propose a new theoretical setting based on the mathematical framework of hierarchical Bayesian inference for reasoning about the visual system. In this framework, the recurrent feedforward/feedback loops in the cortex serve to integrate top-down contextual priors and bottom-up observations so as to implement concurrent probabilistic inference along the visual hierarchy. We suggest that the algorithms of particle filtering and Bayesian-belief propagation might model these interactive cortical computations. We review some recent neurophysiological evidences that support the plausibility of these ideas.Mathematic
Emergence of Shape Bias in Convolutional Neural Networks through Activation Sparsity
Current deep-learning models for object recognition are known to be heavily
biased toward texture. In contrast, human visual systems are known to be biased
toward shape and structure. What could be the design principles in human visual
systems that led to this difference? How could we introduce more shape bias
into the deep learning models? In this paper, we report that sparse coding, a
ubiquitous principle in the brain, can in itself introduce shape bias into the
network. We found that enforcing the sparse coding constraint using a
non-differential Top-K operation can lead to the emergence of structural
encoding in neurons in convolutional neural networks, resulting in a smooth
decomposition of objects into parts and subparts and endowing the networks with
shape bias. We demonstrated this emergence of shape bias and its functional
benefits for different network structures with various datasets. For object
recognition convolutional neural networks, the shape bias leads to greater
robustness against style and pattern change distraction. For the image
synthesis generative adversary networks, the emerged shape bias leads to more
coherent and decomposable structures in the synthesized images. Ablation
studies suggest that sparse codes tend to encode structures, whereas the more
distributed codes tend to favor texture. Our code is host at the github
repository: \url{https://github.com/Crazy-Jack/nips2023_shape_vs_texture}Comment: Published as NeurIPS 2023 (Oral
Recommended from our members
The Role of the Primary Visual Cortex in Higher Level Vision
In the classical feed-forward, modular view of visual processing, the primary visual cortex (area V1) is a module that serves to extract local features such as edges and bars. Representation and recognition of objects are thought to be functions of higher extrastriate cortical areas. This paper presents neurophysiological data that show the later part of V1 neurons’ responses reflecting higher order perceptual computations related to Ullman’s (Cognition 1984;18:97–159) visual routines and Marr’s (Vision NJ: Freeman 1982) full primal sketch, 2Image D sketch and 3D model. Based on theoretical reasoning and the experimental evidence, we propose a possible reinterpretation of the functional role of V1. In this framework, because of V1 neurons’ precise encoding of orientation and spatial information, higher level perceptual computations and representations that involve high resolution details, fine geometry and spatial precision would necessarily involve V1 and be reflected in the later part of its neurons’ activities.Mathematic
- …