695 research outputs found

    Cortical Surround Interactions and Perceptual Salience via Natural Scene Statistics

    Get PDF
    Spatial context in images induces perceptual phenomena associated with salience and modulates the responses of neurons in primary visual cortex (V1). However, the computational and ecological principles underlying contextual effects are incompletely understood. We introduce a model of natural images that includes grouping and segmentation of neighboring features based on their joint statistics, and we interpret the firing rates of V1 neurons as performing optimal recognition in this model. We show that this leads to a substantial generalization of divisive normalization, a computation that is ubiquitous in many neural areas and systems. A main novelty in our model is that the influence of the context on a target stimulus is determined by their degree of statistical dependence. We optimized the parameters of the model on natural image patches, and then simulated neural and perceptual responses on stimuli used in classical experiments. The model reproduces some rich and complex response patterns observed in V1, such as the contrast dependence, orientation tuning and spatial asymmetry of surround suppression, while also allowing for surround facilitation under conditions of weak stimulation. It also mimics the perceptual salience produced by simple displays, and leads to readily testable predictions. Our results provide a principled account of orientation-based contextual modulation in early vision and its sensitivity to the homogeneity and spatial arrangement of inputs, and lends statistical support to the theory that V1 computes visual salience

    Texture Segregation By Visual Cortex: Perceptual Grouping, Attention, and Learning

    Get PDF
    A neural model is proposed of how laminar interactions in the visual cortex may learn and recognize object texture and form boundaries. The model brings together five interacting processes: region-based texture classification, contour-based boundary grouping, surface filling-in, spatial attention, and object attention. The model shows how form boundaries can determine regions in which surface filling-in occurs; how surface filling-in interacts with spatial attention to generate a form-fitting distribution of spatial attention, or attentional shroud; how the strongest shroud can inhibit weaker shrouds; and how the winning shroud regulates learning of texture categories, and thus the allocation of object attention. The model can discriminate abutted textures with blurred boundaries and is sensitive to texture boundary attributes like discontinuities in orientation and texture flow curvature as well as to relative orientations of texture elements. The model quantitatively fits a large set of human psychophysical data on orientation-based textures. Object boundar output of the model is compared to computer vision algorithms using a set of human segmented photographic images. The model classifies textures and suppresses noise using a multiple scale oriented filterbank and a distributed Adaptive Resonance Theory (dART) classifier. The matched signal between the bottom-up texture inputs and top-down learned texture categories is utilized by oriented competitive and cooperative grouping processes to generate texture boundaries that control surface filling-in and spatial attention. Topdown modulatory attentional feedback from boundary and surface representations to early filtering stages results in enhanced texture boundaries and more efficient learning of texture within attended surface regions. Surface-based attention also provides a self-supervising training signal for learning new textures. Importance of the surface-based attentional feedback in texture learning and classification is tested using a set of textured images from the Brodatz micro-texture album. Benchmark studies vary from 95.1% to 98.6% with attention, and from 90.6% to 93.2% without attention.Air Force Office of Scientific Research (F49620-01-1-0397, F49620-01-1-0423); National Science Foundation (SBE-0354378); Office of Naval Research (N00014-01-1-0624

    Binocular fusion and invariant category learning due to predictive remapping during scanning of a depthful scene with eye movements

    Get PDF
    How does the brain maintain stable fusion of 3D scenes when the eyes move? Every eye movement causes each retinal position to process a different set of scenic features, and thus the brain needs to binocularly fuse new combinations of features at each position after an eye movement. Despite these breaks in retinotopic fusion due to each movement, previously fused representations of a scene in depth often appear stable. The 3D ARTSCAN neural model proposes how the brain does this by unifying concepts about how multiple cortical areas in the What and Where cortical streams interact to coordinate processes of 3D boundary and surface perception, spatial attention, invariant object category learning, predictive remapping, eye movement control, and learned coordinate transformations. The model explains data from single neuron and psychophysical studies of covert visual attention shifts prior to eye movements. The model further clarifies how perceptual, attentional, and cognitive interactions among multiple brain regions (LGN, V1, V2, V3A, V4, MT, MST, PPC, LIP, ITp, ITa, SC) may accomplish predictive remapping as part of the process whereby view-invariant object categories are learned. These results build upon earlier neural models of 3D vision and figure-ground separation and the learning of invariant object categories as the eyes freely scan a scene. A key process concerns how an object's surface representation generates a form-fitting distribution of spatial attention, or attentional shroud, in parietal cortex that helps maintain the stability of multiple perceptual and cognitive processes. Predictive eye movement signals maintain the stability of the shroud, as well as of binocularly fused perceptual boundaries and surface representations.Published versio

    Multiple components of surround modulation in primary visual cortex: Multiple neural circuits with multiple functions?

    Get PDF
    pre-printThe responses of neurons in primary visual cortex (V1) to stimulation of their receptive field (RF) are modulated by stimuli in the RF surround. This modulation is suppressive when the stimuli in the RF and surround are of similar orientation, but less suppressive or facilitatory when they are cross-oriented. Similarly, in human vision surround stimuli selectively suppress the perceived contrast of a central stimulus. Although the properties of surround modulation have been thoroughly characterized in many species, cortical areas and sensory modalities, its role in perception remains unknown. Here we argue that surround modulation in V1 consists of multiple components having different spatio-temporal and tuning properties, generated by different neural circuits and serving different visual functions. One component arises from LGN afferents, is fast, untuned for orientation, and spatially restricted to the surround region nearest to the RF (the near-surround); its function is to normalize V1 cell responses to local contrast. Intra-V1 horizontal connections contribute a slower, narrowly orientation-tuned component to near-surround modulation, whose function is to increase the coding efficiency of natural images in manner that leads to the extraction of object boundaries. The third component is generated by topdown feedback connections to V1, is fast, broadly orientation-tuned, and extends into the far-surround; its function is to enhance the salience of behaviorally relevant visual features. Far- and near-surround modulation, thus, act as parallel mechanisms: the former quickly detects and guides saccades/attention to salient visual scene locations, the latter segments object boundaries in the scene

    Change blindness: eradication of gestalt strategies

    Get PDF
    Arrays of eight, texture-defined rectangles were used as stimuli in a one-shot change blindness (CB) task where there was a 50% chance that one rectangle would change orientation between two successive presentations separated by an interval. CB was eliminated by cueing the target rectangle in the first stimulus, reduced by cueing in the interval and unaffected by cueing in the second presentation. This supports the idea that a representation was formed that persisted through the interval before being 'overwritten' by the second presentation (Landman et al, 2003 Vision Research 43149–164]. Another possibility is that participants used some kind of grouping or Gestalt strategy. To test this we changed the spatial position of the rectangles in the second presentation by shifting them along imaginary spokes (by ±1 degree) emanating from the central fixation point. There was no significant difference seen in performance between this and the standard task [F(1,4)=2.565, p=0.185]. This may suggest two things: (i) Gestalt grouping is not used as a strategy in these tasks, and (ii) it gives further weight to the argument that objects may be stored and retrieved from a pre-attentional store during this task

    Dynamic and Integrative Properties of the Primary Visual Cortex

    Get PDF
    The ability to derive meaning from complex, ambiguous sensory input requires the integration of information over both space and time, as well as cognitive mechanisms to dynamically shape that integration. We have studied these processes in the primary visual cortex (V1), where neurons have been proposed to integrate visual inputs along a geometric pattern known as the association field (AF). We first used cortical reorganization as a model to investigate the role that a specific network of V1 connections, the long-range horizontal connections, might play in temporal and spatial integration across the AF. When retinal lesions ablate sensory information from portions of the visual field, V1 undergoes a process of reorganization mediated by compensatory changes in the network of horizontal collaterals. The reorganization accompanies the brain’s amazing ability to perceptually “fill-inâ€, or “seeâ€, the lost visual input. We developed a computational model to simulate cortical reorganization and perceptual fill-in mediated by a plexus of horizontal connections that encode the AF. The model reproduces the major features of the perceptual fill-in reported by human subjects with retinal lesions, and it suggests that V1 neurons, empowered by their horizontal connections, underlie both perceptual fill-in and normal integrative mechanisms that are crucial to our visual perception. These results motivated the second prong of our work, which was to experimentally study the normal integration of information in V1. Since psychophysical and physiological studies suggest that spatial interactions in V1 may be under cognitive control, we investigated the integrative properties of V1 neurons under different cognitive states. We performed extracellular recordings from single V1 neurons in macaques that were trained to perform a delayed-match-to-sample contour detection task. We found that the ability of V1 neurons to summate visual inputs from beyond the classical receptive field (cRF) imbues them with selectivity for complex contour shapes, and that neuronal shape selectivity in V1 changed dynamically according to the shapes monkeys were cued to detect. Over the population, V1 encoded subsets of the AF, predicted by the computational model, that shifted as a function of the monkeys’ expectations. These results support the major conclusions of the theoretical work; even more, they reveal a sophisticated mode of form processing, whereby the selectivity of the whole network in V1 is reshaped by cognitive state

    Predictive coding as a model of the V1 saliency map hypothesis

    Get PDF
    The predictive coding/biased competition (PC/BC) model is a specific implementation of predictive coding theory that has previously been shown to provide a detailed account of the response properties of orientation tuned cells in primary visual cortex (V1). Here it is shown that the same model can successfully simulate psy-chophysical data relating to the saliency of unique items in search arrays, of contours embedded in random texture, and of borders between textured regions. This model thus provides a possible implementation of the hypothesis that V1 generates a bottom-up saliency map. However, PC/BC is very different from previous mod-els of visual salience, in that it proposes that saliency results from the failure of an internal model of simple elementary image components to accurately predict the visual input. Saliency can therefore be interpreted as a mechanism by which prediction errors attract attention in an attempt to improve the accuracy of the brain’s internal representation of the world

    Neural dynamics of invariant object recognition: relative disparity, binocular fusion, and predictive eye movements

    Full text link
    How does the visual cortex learn invariant object categories as an observer scans a depthful scene? Two neural processes that contribute to this ability are modeled in this thesis. The first model clarifies how an object is represented in depth. Cortical area V1 computes absolute disparity, which is the horizontal difference in retinal location of an image in the left and right foveas. Many cells in cortical area V2 compute relative disparity, which is the difference in absolute disparity of two visible features. Relative, but not absolute, disparity is unaffected by the distance of visual stimuli from an observer, and by vergence eye movements. A laminar cortical model of V2 that includes shunting lateral inhibition of disparity-sensitive layer 4 cells causes a peak shift in cell responses that transforms absolute disparity from V1 into relative disparity in V2. The second model simulates how the brain maintains stable percepts of a 3D scene during binocular movements. The visual cortex initiates the formation of a 3D boundary and surface representation by binocularly fusing corresponding features from the left and right retinotopic images. However, after each saccadic eye movement, every scenic feature projects to a different combination of retinal positions than before the saccade. Yet the 3D representation, resulting from the prior fusion, is stable through the post-saccadic re-fusion. One key to stability is predictive remapping: the system anticipates the new retinal positions of features entailed by eye movements by using gain fields that are updated by eye movement commands. The 3D ARTSCAN model developed here simulates how perceptual, attentional, and cognitive interactions across different brain regions within the What and Where visual processing streams interact to coordinate predictive remapping, stable 3D boundary and surface perception, spatial attention, and the learning of object categories that are invariant to changes in an object's retinal projections. Such invariant learning helps the system to avoid treating each new view of the same object as a distinct object to be learned. The thesis hereby shows how a process that enables invariant object category learning can be extended to also enable stable 3D scene perception
    • …
    corecore