3,380 research outputs found
Learning long-range spatial dependencies with horizontal gated-recurrent units
Progress in deep learning has spawned great successes in many engineering
applications. As a prime example, convolutional neural networks, a type of
feedforward neural networks, are now approaching -- and sometimes even
surpassing -- human accuracy on a variety of visual recognition tasks. Here,
however, we show that these neural networks and their recent extensions
struggle in recognition tasks where co-dependent visual features must be
detected over long spatial ranges. We introduce the horizontal gated-recurrent
unit (hGRU) to learn intrinsic horizontal connections -- both within and across
feature columns. We demonstrate that a single hGRU layer matches or outperforms
all tested feedforward hierarchical baselines including state-of-the-art
architectures which have orders of magnitude more free parameters. We further
discuss the biological plausibility of the hGRU in comparison to anatomical
data from the visual cortex as well as human behavioral data on a classic
contour detection task.Comment: Published at NeurIPS 2018
https://papers.nips.cc/paper/7300-learning-long-range-spatial-dependencies-with-horizontal-gated-recurrent-unit
A geometric model of multi-scale orientation preference maps via Gabor functions
In this paper we present a new model for the generation of orientation
preference maps in the primary visual cortex (V1), considering both orientation
and scale features. First we undertake to model the functional architecture of
V1 by interpreting it as a principal fiber bundle over the 2-dimensional
retinal plane by introducing intrinsic variables orientation and scale. The
intrinsic variables constitute a fiber on each point of the retinal plane and
the set of receptive profiles of simple cells is located on the fiber. Each
receptive profile on the fiber is mathematically interpreted as a rotated Gabor
function derived from an uncertainty principle. The visual stimulus is lifted
in a 4-dimensional space, characterized by coordinate variables, position,
orientation and scale, through a linear filtering of the stimulus with Gabor
functions. Orientation preference maps are then obtained by mapping the
orientation value found from the lifting of a noise stimulus onto the
2-dimensional retinal plane. This corresponds to a Bargmann transform in the
reducible representation of the group. A
comparison will be provided with a previous model based on the Bargman
transform in the irreducible representation of the group,
outlining that the new model is more physiologically motivated. Then we present
simulation results related to the construction of the orientation preference
map by using Gabor filters with different scales and compare those results to
the relevant neurophysiological findings in the literature
Director Field Model of the Primary Visual Cortex for Contour Detection
We aim to build the simplest possible model capable of detecting long, noisy
contours in a cluttered visual scene. For this, we model the neural dynamics in
the primate primary visual cortex in terms of a continuous director field that
describes the average rate and the average orientational preference of active
neurons at a particular point in the cortex. We then use a linear-nonlinear
dynamical model with long range connectivity patterns to enforce long-range
statistical context present in the analyzed images. The resulting model has
substantially fewer degrees of freedom than traditional models, and yet it can
distinguish large contiguous objects from the background clutter by suppressing
the clutter and by filling-in occluded elements of object contours. This
results in high-precision, high-recall detection of large objects in cluttered
scenes. Parenthetically, our model has a direct correspondence with the Landau
- de Gennes theory of nematic liquid crystal in two dimensions.Comment: 9 pages, 7 figure
Motion clouds: model-based stimulus synthesis of natural-like random textures for the study of motion perception
Choosing an appropriate set of stimuli is essential to characterize the
response of a sensory system to a particular functional dimension, such as the
eye movement following the motion of a visual scene. Here, we describe a
framework to generate random texture movies with controlled information
content, i.e., Motion Clouds. These stimuli are defined using a generative
model that is based on controlled experimental parametrization. We show that
Motion Clouds correspond to dense mixing of localized moving gratings with
random positions. Their global envelope is similar to natural-like stimulation
with an approximate full-field translation corresponding to a retinal slip. We
describe the construction of these stimuli mathematically and propose an
open-source Python-based implementation. Examples of the use of this framework
are shown. We also propose extensions to other modalities such as color vision,
touch, and audition
Contour integration: Psychophysical, neurophysiological, and computational perspectives
One of the important roles of our visual system is to detect and segregate objects. Neurons in the early visual system extract local image features from the visual scene. To combine these features into separate, global objects, the visual system must perform some kind of grouping operation. One such operation is contour integration. Contours form the outlines of objects, and are the first step in shape perception. We discuss the mechanism of contour integration from psychophysical, neurophysiological, and computational perspectives
Neural models of inter-cortical networks in the primate visual system for navigation, attention, path perception, and static and kinetic figure-ground perception
Vision provides the primary means by which many animals distinguish foreground objects from their background and coordinate locomotion through complex environments. The present thesis focuses on mechanisms within the visual system that afford figure-ground segregation and self-motion perception. These processes are modeled as emergent outcomes of dynamical interactions among neural populations in several brain areas. This dissertation specifies and simulates how border-ownership signals emerge in cortex, and how the medial superior temporal area (MSTd) represents path of travel and heading, in the presence of independently moving objects (IMOs).
Neurons in visual cortex that signal border-ownership, the perception that a border belongs to a figure and not its background, have been identified but the underlying mechanisms have been unclear. A model is presented that demonstrates that inter-areal interactions across model visual areas V1-V2-V4 afford border-ownership signals similar to those reported in electrophysiology for visual displays containing figures defined by luminance contrast. Competition between model neurons with different receptive field sizes is crucial for reconciling the occlusion of one object by another. The model is extended to determine border-ownership when object borders are kinetically-defined, and to detect the location and size of shapes, despite the curvature of their boundary contours.
Navigation in the real world requires humans to travel along curved paths. Many perceptual models have been proposed that focus on heading, which specifies the direction of travel along straight paths, but not on path curvature. In primates, MSTd has been implicated in heading perception. A model of V1, medial temporal area (MT), and MSTd is developed herein that demonstrates how MSTd neurons can simultaneously encode path curvature and heading. Human judgments of heading are accurate in rigid environments, but are biased in the presence of IMOs. The model presented here explains the bias through recurrent connectivity in MSTd and avoids the use of differential motion detectors which, although used in existing models to discount the motion of an IMO relative to its background, is not biologically plausible. Reported modulation of the MSTd population due to attention is explained through competitive dynamics between subpopulations responding to bottom-up and top- down signals
Bio-Inspired Computer Vision: Towards a Synergistic Approach of Artificial and Biological Vision
To appear in CVIUStudies in biological vision have always been a great source of inspiration for design of computer vision algorithms. In the past, several successful methods were designed with varying degrees of correspondence with biological vision studies, ranging from purely functional inspiration to methods that utilise models that were primarily developed for explaining biological observations. Even though it seems well recognised that computational models of biological vision can help in design of computer vision algorithms, it is a non-trivial exercise for a computer vision researcher to mine relevant information from biological vision literature as very few studies in biology are organised at a task level. In this paper we aim to bridge this gap by providing a computer vision task centric presentation of models primarily originating in biological vision studies. Not only do we revisit some of the main features of biological vision and discuss the foundations of existing computational studies modelling biological vision, but also we consider three classical computer vision tasks from a biological perspective: image sensing, segmentation and optical flow. Using this task-centric approach, we discuss well-known biological functional principles and compare them with approaches taken by computer vision. Based on this comparative analysis of computer and biological vision, we present some recent models in biological vision and highlight a few models that we think are promising for future investigations in computer vision. To this extent, this paper provides new insights and a starting point for investigators interested in the design of biology-based computer vision algorithms and pave a way for much needed interaction between the two communities leading to the development of synergistic models of artificial and biological vision
Feedback and surround modulated boundary detection
Altres ajuts: CERCA Programme/Generalitat de CatalunyaEdges are key components of any visual scene to the extent that we can recognise objects merely by their silhouettes. The human visual system captures edge information through neurons in the visual cortex that are sensitive to both intensity discontinuities and particular orientations. The "classical approach" assumes that these cells are only responsive to the stimulus present within their receptive fields, however, recent studies demonstrate that surrounding regions and inter-areal feedback connections influence their responses significantly. In this work we propose a biologically-inspired edge detection model in which orientation selective neurons are represented through the first derivative of a Gaussian function resembling double-opponent cells in the primary visual cortex (V1). In our model we account for four kinds of receptive field surround, i.e. full, far, iso- and orthogonal-orientation, whose contributions are contrast-dependant. The output signal fromV1 is pooled in its perpendicular direction by larger V2 neurons employing a contrast-variant centre-surround kernel. We further introduce a feedback connection from higher-level visual areas to the lower ones. The results of our model on three benchmark datasets show a big improvement compared to the current non-learning and biologically-inspired state-of-the-art algorithms while being competitive to the learning-based methods
Change blindness: eradication of gestalt strategies
Arrays of eight, texture-defined rectangles were used as stimuli in a one-shot change blindness (CB) task where there was a 50% chance that one rectangle would change orientation between two successive presentations separated by an interval. CB was eliminated by cueing the target rectangle in the first stimulus, reduced by cueing in the interval and unaffected by cueing in the second presentation. This supports the idea that a representation was formed that persisted through the interval before being 'overwritten' by the second presentation (Landman et al, 2003 Vision Research 43149–164]. Another possibility is that participants used some kind of grouping or Gestalt strategy. To test this we changed the spatial position of the rectangles in the second presentation by shifting them along imaginary spokes (by ±1 degree) emanating from the central fixation point. There was no significant difference seen in performance between this and the standard task [F(1,4)=2.565, p=0.185]. This may suggest two things: (i) Gestalt grouping is not used as a strategy in these tasks, and (ii) it gives further weight to the argument that objects may be stored and retrieved from a pre-attentional store during this task
- …