10,527 research outputs found

    A Structured Model of Video Reproduces Primary Visual Cortical Organisation

    Get PDF
    The visual system must learn to infer the presence of objects and features in the world from the images it encounters, and as such it must, either implicitly or explicitly, model the way these elements interact to create the image. Do the response properties of cells in the mammalian visual system reflect this constraint? To address this question, we constructed a probabilistic model in which the identity and attributes of simple visual elements were represented explicitly and learnt the parameters of this model from unparsed, natural video sequences. After learning, the behaviour and grouping of variables in the probabilistic model corresponded closely to functional and anatomical properties of simple and complex cells in the primary visual cortex (V1). In particular, feature identity variables were activated in a way that resembled the activity of complex cells, while feature attribute variables responded much like simple cells. Furthermore, the grouping of the attributes within the model closely parallelled the reported anatomical grouping of simple cells in cat V1. Thus, this generative model makes explicit an interpretation of complex and simple cells as elements in the segmentation of a visual scene into basic independent features, along with a parametrisation of their moment-by-moment appearances. We speculate that such a segmentation may form the initial stage of a hierarchical system that progressively separates the identity and appearance of more articulated visual elements, culminating in view-invariant object recognition

    A Cognitive Science Based Machine Learning Architecture

    Get PDF
    In an attempt to illustrate the application of cognitive science principles to hard AI problems in machine learning we propose the LIDA technology, a cognitive science based architecture capable of more human-like learning. A LIDA based software agent or cognitive robot will be capable of three fundamental, continuously active, humanlike learning mechanisms:\ud 1) perceptual learning, the learning of new objects, categories, relations, etc.,\ud 2) episodic learning of events, the what, where, and when,\ud 3) procedural learning, the learning of new actions and action sequences with which to accomplish new tasks. The paper argues for the use of modular components, each specializing in implementing individual facets of human and animal cognition, as a viable approach towards achieving general intelligence

    A Neural Model of How the Brain Computes Heading from Optic Flow in Realistic Scenes

    Full text link
    Animals avoid obstacles and approach goals in novel cluttered environments using visual information, notably optic flow, to compute heading, or direction of travel, with respect to objects in the environment. We present a neural model of how heading is computed that describes interactions among neurons in several visual areas of the primate magnocellular pathway, from retina through V1, MT+, and MSTd. The model produces outputs which are qualitatively and quantitatively similar to human heading estimation data in response to complex natural scenes. The model estimates heading to within 1.5° in random dot or photo-realistically rendered scenes and within 3° in video streams from driving in real-world environments. Simulated rotations of less than 1 degree per second do not affect model performance, but faster simulated rotation rates deteriorate performance, as in humans. The model is part of a larger navigational system that identifies and tracks objects while navigating in cluttered environments.National Science Foundation (SBE-0354378, BCS-0235398); Office of Naval Research (N00014-01-1-0624); National-Geospatial Intelligence Agency (NMA201-01-1-2016

    Modeling Bottom-Up and Top-Down Attention with a Neurodynamic Model of V1

    Get PDF
    Previous studies in that line suggested that lateral interactions of V1 cells are responsible, among other visual effects, of bottom-up visual attention (alternatively named visual salience or saliency). Our objective is to mimic these connections in the visual system with a neurodynamic network of firing-rate neurons. Early subcortical processes (i.e. retinal and thalamic) are functionally simulated. An implementation of the cortical magnification function is included to define the retinotopical projections towards V1, processing neuronal activity for each distinct view during scene observation. Novel computational definitions of top-down inhibition (in terms of inhibition of return and selection mechanisms), are also proposed to predict attention in Free-Viewing and Visual Search conditions. Results show that our model outpeforms other biologically-inpired models of saliency prediction as well as to predict visual saccade sequences during free viewing. We also show how temporal and spatial characteristics of inhibition of return can improve prediction of saccades, as well as how distinct search strategies (in terms of feature-selective or category-specific inhibition) predict attention at distinct image contexts.Comment: 32 pages, 19 figure

    Learning complex cell invariance from natural videos: A plausibility proof

    Get PDF
    One of the most striking feature of the cortex is its ability to wire itself. Understanding how the visual cortex wires up through development and how visual experience refines connections into adulthood is a key question for Neuroscience. While computational models of the visual cortex are becoming increasingly detailed, the question of how such architecture could self-organize through visual experience is often overlooked. Here we focus on the class of hierarchical feedforward models of the ventral stream of the visual cortex, which extend the classical simple-to-complex cells model by Hubel and Wiesel (1962) to extra-striate areas, and have been shown to account for a host of experimental data. Such models assume two functional classes of simple and complex cells with specific predictions about their respective wiring and resulting functionalities.In these networks, the issue of learning, especially for complex cells, is perhaps the least well understood. In fact, in most of these models, the connectivity between simple and complex cells is not learned butrather hard-wired. Several algorithms have been proposed for learning invariances at the complex cell level based on a trace rule to exploit the temporal continuity of sequences of natural images, but very few can learn from natural cluttered image sequences.Here we propose a new variant of the trace rule that only reinforces the synapses between the most active cells, and therefore can handle cluttered environments. The algorithm has so far been developed and tested at the level of V1-like simple and complex cells: we verified that Gabor-like simple cell selectivity could emerge from competitive Hebbian learning. In addition, we show how the modified trace rule allows the subsequent complex cells to learn to selectively pool over simple cells with the same preferred orientation but slightly different positions thus increasing their tolerance to the precise position of the stimulus within their receptive fields

    Time-Contrastive Networks: Self-Supervised Learning from Video

    Full text link
    We propose a self-supervised approach for learning representations and robotic behaviors entirely from unlabeled videos recorded from multiple viewpoints, and study how this representation can be used in two robotic imitation settings: imitating object interactions from videos of humans, and imitating human poses. Imitation of human behavior requires a viewpoint-invariant representation that captures the relationships between end-effectors (hands or robot grippers) and the environment, object attributes, and body pose. We train our representations using a metric learning loss, where multiple simultaneous viewpoints of the same observation are attracted in the embedding space, while being repelled from temporal neighbors which are often visually similar but functionally different. In other words, the model simultaneously learns to recognize what is common between different-looking images, and what is different between similar-looking images. This signal causes our model to discover attributes that do not change across viewpoint, but do change across time, while ignoring nuisance variables such as occlusions, motion blur, lighting and background. We demonstrate that this representation can be used by a robot to directly mimic human poses without an explicit correspondence, and that it can be used as a reward function within a reinforcement learning algorithm. While representations are learned from an unlabeled collection of task-related videos, robot behaviors such as pouring are learned by watching a single 3rd-person demonstration by a human. Reward functions obtained by following the human demonstrations under the learned representation enable efficient reinforcement learning that is practical for real-world robotic systems. Video results, open-source code and dataset are available at https://sermanet.github.io/imitat

    M2 receptors are required for spatiotemporal sequence learning in mouse primary visual cortex

    Full text link
    Acetylcholine is a neuromodulator that plays a variety of roles in the central nervous system and is highly implicated in visual perception and visual cortical plasticity. Visual sequence learning, defined here as the ability to encode and predict the spatiotemporal content of visual information, has been shown to depend on muscarinic signaling in the mouse primary visual cortex (V1). Muscarinic signaling is a complex process involving the combined activities of five different G-protein coupled receptors, M1-M5, all of which are expressed in the murine brain but differ from each other functionally and in anatomical localization. While previous work has isolated the required signaling to V1, it is unknown which muscarinic receptors are required for spatiotemporal sequence learning. We hypothesized that M1 or M2 receptors are required for sequence learning since they are known to be abundantly expressed in rodent V1. Our aim was to identify the muscarinic receptor required for sequence learning using electrophysiology, followed by immunofluorescence to determine the anatomical distribution of the identified receptor in V1 in a layer-wise and cell-type fashion. Another aim was to better tease out the timing of muscarinic activity required for encoding the visual sequence. Here we present electrophysiological evidence that M2, but not M1, receptors are required for spatiotemporal sequence learning in mouse V1. We show that M2 is highly expressed in the neuropil in V1, especially in thalamorecipient layer 4, and co-localizes to the soma of a subset of somatostatin expressing neurons in deep layers. We also show that expression of M2 receptors is higher in the monocular region of V1 than it is in the binocular region, but that the amount of experience-dependent sequence potentiation is similar in both regions. Finally, we show that interrupting mAChR activity after visual stimulation does not prevent sequence potentiation. This work establishes a new functional role for M2-type receptors in processing temporal information and demonstrates that monocular circuits are modified by experience in a manner like binocular circuits
    • …
    corecore