213,001 research outputs found
A vision fronted with a new disparity model
We are developing a frontend that is based on the image representation in the
visual cortex and plausible processing schemes. This frontend consists of multiscale
line/edge and keypoint (vertex) detection, using models of simple, complex
and end-stopped cells. This frontend is being extended by a new disparity model.
Assuming that there is no neural inverse tangent operator, we do not exploit Gabor
phase information. Instead, we directly use simple cell (Gabor) responses at
positions where lines and edges are detected
Sparse visual models for biologically inspired sensorimotor control
Given the importance of using resources efficiently in the competition for survival, it is reasonable to think that natural evolution has discovered efficient cortical coding strategies for representing natural visual information. Sparse representations have intrinsic advantages in terms of fault-tolerance and low-power consumption potential, and can therefore be attractive for robot sensorimotor control with powerful dispositions for decision-making. Inspired by the mammalian brain and its visual ventral pathway, we present in this paper a hierarchical sparse coding network architecture that extracts visual features for use in sensorimotor control. Testing with natural images demonstrates that this sparse coding facilitates processing and learning in subsequent layers. Previous studies have shown how the responses of complex cells could be sparsely represented by a higher-order neural layer. Here we extend sparse coding in each network layer, showing that detailed modeling of earlier stages in the visual pathway enhances the characteristics of the receptive fields developed in subsequent stages. The yield network is more dynamic with richer and more biologically plausible input and output representation
Provably scale-covariant networks from oriented quasi quadrature measures in cascade
This article presents a continuous model for hierarchical networks based on a
combination of mathematically derived models of receptive fields and
biologically inspired computations. Based on a functional model of complex
cells in terms of an oriented quasi quadrature combination of first- and
second-order directional Gaussian derivatives, we couple such primitive
computations in cascade over combinatorial expansions over image orientations.
Scale-space properties of the computational primitives are analysed and it is
shown that the resulting representation allows for provable scale and rotation
covariance. A prototype application to texture analysis is developed and it is
demonstrated that a simplified mean-reduced representation of the resulting
QuasiQuadNet leads to promising experimental results on three texture datasets.Comment: 12 pages, 3 figures, 1 tabl
Unsupervised learning of clutter-resistant visual representations from natural videos
Populations of neurons in inferotemporal cortex (IT) maintain an explicit
code for object identity that also tolerates transformations of object
appearance e.g., position, scale, viewing angle [1, 2, 3]. Though the learning
rules are not known, recent results [4, 5, 6] suggest the operation of an
unsupervised temporal-association-based method e.g., Foldiak's trace rule [7].
Such methods exploit the temporal continuity of the visual world by assuming
that visual experience over short timescales will tend to have invariant
identity content. Thus, by associating representations of frames from nearby
times, a representation that tolerates whatever transformations occurred in the
video may be achieved. Many previous studies verified that such rules can work
in simple situations without background clutter, but the presence of visual
clutter has remained problematic for this approach. Here we show that temporal
association based on large class-specific filters (templates) avoids the
problem of clutter. Our system learns in an unsupervised way from natural
videos gathered from the internet, and is able to perform a difficult
unconstrained face recognition task on natural images: Labeled Faces in the
Wild [8]
Boundary Contour System and Feature Contour System
When humans gaze upon a scene, our brains rapidly combine several different types of locally ambiguous visual information to generate a globally consistent and unambiguous representation of Form-And-Color-And-DEpth, or FACADE. This state of affairs raises the question: What new computational principles and mechanisms are needed to understand how multiple sources of visual information cooperate automatically to generate a percept of 3-dimensional form?
This chapter reviews some modeling work aimed at developing such a general-purpose vision architecture. This architecture clarifies how scenic data about boundaries, textures, shading, depth, multiple spatial scales, and motion can be cooperatively synthesized in real-time into a coherent representation of 3-dimensional form. It embodies a new vision theory that attempts to clarify the functional organzation of the visual brain from the lateral geniculate nucleus (LGN) to the extrastriate cortical regions V4 and MT. Moreover, the same processes which are useful towards explaining how the visual cortex processes retinal signals are equally valuable for processing noisy multidimensional data from artificial sensors, such as synthetic aperture radar, laser radar, multispectral infrared, magnetic resonance, and high-altitude photographs. These processes generate 3-D boundary and surface representations of a scene.Office of Naval Research (N00011-95-I-0409, N00014-95-I-0657
Texture Segregation, Surface Representation, and Figure-ground Separation
A widespread view is that most of texture segregation can be accounted for by differences in the spatial frequency content of texture regions. Evidence from both psychophysical and physiological studies indicate, however, that beyond these early filtering stages,there are stages of 3-D boundary segmentation and surface representation that are used to segregate textures. Chromatic segregation of element-arrangement patterns as studied by Beck and colleagues - cannot be completely explained by the filtering mechanisms previously employed to account for achromatic segregation. An element arrangement pattern is composed of two types of elements that are arranged differently in different image regions (e.g., vertically on top and diagonally on bottom). FACADE theory mechanisms that have previously been used to explain data about 3-D vision and figure-ground separation are here used to simulate chromatic texture segregation data, in eluding data with equiluminant elements on dark or light homogenous backgrounds, or backgrounds composed of vertical and horizontal dark or light stripes, or horizontal notched stripes. These data include the fact that segregation of patterns composed of red and blue squares decreases with inereasing luminance of the interspaces. Asymmetric segregation properties under 3-D viewing conditions with the cquiluminant element;; dose or far arc abo simulated. Two key model properties arc a spatial impenetrability property that inhibits boundary grouping across regions with noncolinear texture elements, and a boundary-surface consistency property that uses feedback between boundary and surface representations to eliminate spurious boundary groupings and separate figures from their backgrounds.Office of Naval Research (N00014-95-1-0409, N00014-95-1-0657, ONR N00014-91-J-4100); CNPq/Brazil (520419/96-0); Air Force Office of Scientific Research (F49620-92-J-0334
Cortical Dynamics of 3-D Surface Perception: Binocular and Half-Occluded Scenic Images
Previous models of stereopsis have concentrated on the task of binocularly matching left and right eye primitives uniquely. A disparity smoothness constraint is often invoked to limit the number of possible matches. These approaches neglect the fact that surface discontinuities are both abundant in natural everyday scenes, and provide a useful cue for scene segmentation. da Vinci stereopsis refers to the more general problem of dealing with surface discontinuities and their associated unmatched monocular regions within binocular scenes. This study develops a mathematical realization of a neural network theory of biological vision, called FACADE Theory, that shows how early cortical stereopsis processes are related to later cortical processes of 3-D surface representation. The mathematical model demonstrates through computer simulation how the visual cortex may generate 3-D boundary segmentations and use them to control filling-in of 3-D surface properties in response to visual scenes. Model mechanisms correctly match disparate binocular regions while filling-in monocular regions with the correct depth within a binocularly viewed scene. This achievement required introduction of a new multiscale binocular filter for stereo matching which clarifies how cortical complex cells match image contours of like contrast polarity, while pooling signals from opposite contrast polarities. Competitive interactions among filter cells suggest how false binocular matches and unmatched monocular cues, which contain eye-of-origin information, arc automatically handled across multiple spatial scales. This network also helps to explain data concerning context-sensitive binocular matching. Pooling of signals from even-symmetric and odd-symmctric simple cells at complex cells helps to eliminate spurious activity peaks in matchable signals. Later stages of cortical processing by the blob and interblob streams, including refined concepts of cooperative boundary grouping and reciprocal stream interactions between boundary and surface representations, arc modeled to provide a complete simulation of the da Vinci stereopsis percept.Office of Naval Research (N00014-95-I-0409, N00014-85-1-0657, N00014-92-J-4015, N00014-91-J-4100); Airforce Office of Scientific Research (90-0175); National Science Foundation (IRI-90-00530); The James S. McDonnell Foundation (94-40
- …