10,527 research outputs found
A Structured Model of Video Reproduces Primary Visual Cortical Organisation
The visual system must learn to infer the presence of objects and features in the world from the images it encounters, and as such it must, either implicitly or explicitly, model the way these elements interact to create the image. Do the response properties of cells in the mammalian visual system reflect this constraint? To address this question, we constructed a probabilistic model in which the identity and attributes of simple visual elements were represented explicitly and learnt the parameters of this model from unparsed, natural video sequences. After learning, the behaviour and grouping of variables in the probabilistic model corresponded closely to functional and anatomical properties of simple and complex cells in the primary visual cortex (V1). In particular, feature identity variables were activated in a way that resembled the activity of complex cells, while feature attribute variables responded much like simple cells. Furthermore, the grouping of the attributes within the model closely parallelled the reported anatomical grouping of simple cells in cat V1. Thus, this generative model makes explicit an interpretation of complex and simple cells as elements in the segmentation of a visual scene into basic independent features, along with a parametrisation of their moment-by-moment appearances. We speculate that such a segmentation may form the initial stage of a hierarchical system that progressively separates the identity and appearance of more articulated visual elements, culminating in view-invariant object recognition
A Cognitive Science Based Machine Learning Architecture
In an attempt to illustrate the application of cognitive science principles to hard AI problems in machine learning we propose the LIDA technology, a cognitive science based architecture capable of more human-like learning. A LIDA based software agent or cognitive robot will be capable of three fundamental, continuously active, humanlike learning mechanisms:\ud
1) perceptual learning, the learning of new objects, categories, relations, etc.,\ud
2) episodic learning of events, the what, where, and when,\ud
3) procedural learning, the learning of new actions and action sequences with which to accomplish new tasks. The paper argues for the use of modular components, each specializing in implementing individual facets of human and animal cognition, as a viable approach towards achieving general intelligence
A Neural Model of How the Brain Computes Heading from Optic Flow in Realistic Scenes
Animals avoid obstacles and approach goals in novel cluttered environments using visual information, notably optic flow, to compute heading, or direction of travel, with respect to objects in the environment. We present a neural model of how heading is computed that describes interactions among neurons in several visual areas of the primate magnocellular pathway, from retina through V1, MT+, and MSTd. The model produces outputs which are qualitatively and quantitatively similar to human heading estimation data in response to complex natural scenes. The model estimates heading to within 1.5° in random dot or photo-realistically rendered scenes and within 3° in video streams from driving in real-world environments. Simulated rotations of less than 1 degree per second do not affect model performance, but faster simulated rotation rates deteriorate performance, as in humans. The model is part of a larger navigational system that identifies and tracks objects while navigating in cluttered environments.National Science Foundation (SBE-0354378, BCS-0235398); Office of Naval Research (N00014-01-1-0624); National-Geospatial Intelligence Agency (NMA201-01-1-2016
Modeling Bottom-Up and Top-Down Attention with a Neurodynamic Model of V1
Previous studies in that line suggested that lateral interactions of V1 cells
are responsible, among other visual effects, of bottom-up visual attention
(alternatively named visual salience or saliency). Our objective is to mimic
these connections in the visual system with a neurodynamic network of
firing-rate neurons. Early subcortical processes (i.e. retinal and thalamic)
are functionally simulated. An implementation of the cortical magnification
function is included to define the retinotopical projections towards V1,
processing neuronal activity for each distinct view during scene observation.
Novel computational definitions of top-down inhibition (in terms of inhibition
of return and selection mechanisms), are also proposed to predict attention in
Free-Viewing and Visual Search conditions. Results show that our model
outpeforms other biologically-inpired models of saliency prediction as well as
to predict visual saccade sequences during free viewing. We also show how
temporal and spatial characteristics of inhibition of return can improve
prediction of saccades, as well as how distinct search strategies (in terms of
feature-selective or category-specific inhibition) predict attention at
distinct image contexts.Comment: 32 pages, 19 figure
Learning complex cell invariance from natural videos: A plausibility proof
One of the most striking feature of the cortex is its ability to wire itself. Understanding how the visual cortex wires up through development and how visual experience refines connections into adulthood is a key question for Neuroscience. While computational models of the visual cortex are becoming increasingly detailed, the question of how such architecture could self-organize through visual experience is often overlooked. Here we focus on the class of hierarchical feedforward models of the ventral stream of the visual cortex, which extend the classical simple-to-complex cells model by Hubel and Wiesel (1962) to extra-striate areas, and have been shown to account for a host of experimental data. Such models assume two functional classes of simple and complex cells with specific predictions about their respective wiring and resulting functionalities.In these networks, the issue of learning, especially for complex cells, is perhaps the least well understood. In fact, in most of these models, the connectivity between simple and complex cells is not learned butrather hard-wired. Several algorithms have been proposed for learning invariances at the complex cell level based on a trace rule to exploit the temporal continuity of sequences of natural images, but very few can learn from natural cluttered image sequences.Here we propose a new variant of the trace rule that only reinforces the synapses between the most active cells, and therefore can handle cluttered environments. The algorithm has so far been developed and tested at the level of V1-like simple and complex cells: we verified that Gabor-like simple cell selectivity could emerge from competitive Hebbian learning. In addition, we show how the modified trace rule allows the subsequent complex cells to learn to selectively pool over simple cells with the same preferred orientation but slightly different positions thus increasing their tolerance to the precise position of the stimulus within their receptive fields
Time-Contrastive Networks: Self-Supervised Learning from Video
We propose a self-supervised approach for learning representations and
robotic behaviors entirely from unlabeled videos recorded from multiple
viewpoints, and study how this representation can be used in two robotic
imitation settings: imitating object interactions from videos of humans, and
imitating human poses. Imitation of human behavior requires a
viewpoint-invariant representation that captures the relationships between
end-effectors (hands or robot grippers) and the environment, object attributes,
and body pose. We train our representations using a metric learning loss, where
multiple simultaneous viewpoints of the same observation are attracted in the
embedding space, while being repelled from temporal neighbors which are often
visually similar but functionally different. In other words, the model
simultaneously learns to recognize what is common between different-looking
images, and what is different between similar-looking images. This signal
causes our model to discover attributes that do not change across viewpoint,
but do change across time, while ignoring nuisance variables such as
occlusions, motion blur, lighting and background. We demonstrate that this
representation can be used by a robot to directly mimic human poses without an
explicit correspondence, and that it can be used as a reward function within a
reinforcement learning algorithm. While representations are learned from an
unlabeled collection of task-related videos, robot behaviors such as pouring
are learned by watching a single 3rd-person demonstration by a human. Reward
functions obtained by following the human demonstrations under the learned
representation enable efficient reinforcement learning that is practical for
real-world robotic systems. Video results, open-source code and dataset are
available at https://sermanet.github.io/imitat
M2 receptors are required for spatiotemporal sequence learning in mouse primary visual cortex
Acetylcholine is a neuromodulator that plays a variety of roles in the central nervous system and is highly implicated in visual perception and visual cortical plasticity. Visual sequence learning, defined here as the ability to encode and predict the spatiotemporal content of visual information, has been shown to depend on muscarinic signaling in the mouse primary visual cortex (V1). Muscarinic signaling is a complex process involving the combined activities of five different G-protein coupled receptors, M1-M5, all of which are expressed in the murine brain but differ from each other functionally and in anatomical localization. While previous work has isolated the required signaling to V1, it is unknown which muscarinic receptors are required for spatiotemporal sequence learning.
We hypothesized that M1 or M2 receptors are required for sequence learning since they are known to be abundantly expressed in rodent V1.
Our aim was to identify the muscarinic receptor required for sequence learning using electrophysiology, followed by immunofluorescence to determine the anatomical distribution of the identified receptor in V1 in a layer-wise and cell-type fashion. Another aim was to better tease out the timing of muscarinic activity required for encoding the visual sequence.
Here we present electrophysiological evidence that M2, but not M1, receptors are required for spatiotemporal sequence learning in mouse V1. We show that M2 is highly expressed in the neuropil in V1, especially in thalamorecipient layer 4, and co-localizes to the soma of a subset of somatostatin expressing neurons in deep layers. We also show that expression of M2 receptors is higher in the monocular region of V1 than it is in the binocular region, but that the amount of experience-dependent sequence potentiation is similar in both regions. Finally, we show that interrupting mAChR activity after visual stimulation does not prevent sequence potentiation. This work establishes a new functional role for M2-type receptors in processing temporal information and demonstrates that monocular circuits are modified by experience in a manner like binocular circuits
- …