409 research outputs found
A neural model of border-ownership from kinetic occlusion
Camouflaged animals that have very similar textures to their surroundings are difficult to detect when stationary. However, when an animal moves, humans readily see a figure at a different depth than the background. How do humans perceive a figure breaking camouflage, even though the texture of the figure and its background may be statistically identical in luminance? We present a model that demonstrates how the primate visual system performs figure–ground segregation in extreme cases of breaking camouflage based on motion alone. Border-ownership signals develop as an emergent property in model V2 units whose receptive fields are nearby kinetically defined borders that separate the figure and background. Model simulations support border-ownership as a general mechanism by which the visual system performs figure–ground segregation, despite whether figure–ground boundaries are defined by luminance or motion contrast. The gradient of motion- and luminance-related border-ownership signals explains the perceived depth ordering of the foreground and background surfaces. Our model predicts that V2 neurons, which are sensitive to kinetic edges, are selective to border-ownership (magnocellular B cells). A distinct population of model V2 neurons is selective to border-ownership in figures defined by luminance contrast (parvocellular B cells). B cells in model V2 receive feedback from neurons in V4 and MT with larger receptive fields to bias border-ownership signals toward the figure. We predict that neurons in V4 and MT sensitive to kinetically defined figures play a crucial role in determining whether the foreground surface accretes, deletes, or produces a shearing motion with respect to the background.This work was supported in part by CELEST (NSF SBE-0354378 and OMA-0835976), the Office of Naval Research (ONR N00014-11-1-0535) and Air Force Office of Scientific Research (AFOSR FA9550-12-1-0436). (NSF SBE-0354378 - CELEST; OMA-0835976 - CELEST; ONR N00014-11-1-0535 - Office of Naval Research; AFOSR FA9550-12-1-0436 - Air Force Office of Scientific Research)Published versio
What the ‘Moonwalk’ Illusion Reveals about the Perception of Relative Depth from Motion
When one visual object moves behind another, the object farther from the viewer is progressively occluded and/or disoccluded by the nearer object. For nearly half a century, this dynamic occlusion cue has beenthought to be sufficient by itself for determining the relative depth of the two objects. This view is consistent with the self-evident geometric fact that the surface undergoing dynamic occlusion is always farther from the viewer than the occluding surface. Here we use a contextual manipulation ofa previously known motion illusion, which we refer to as the‘Moonwalk’ illusion, to demonstrate that the visual system cannot determine relative depth from dynamic occlusion alone. Indeed, in the Moonwalk illusion, human observers perceive a relative depth contrary to the dynamic occlusion cue. However, the perception of the expected relative depth is restored by contextual manipulations unrelated to dynamic occlusion. On the other hand, we show that an Ideal Observer can determine using dynamic occlusion alone in the same Moonwalk stimuli, indicating that the dynamic occlusion cue is, in principle, sufficient for determining relative depth. Our results indicate that in order to correctly perceive relative depth from dynamic occlusion, the human brain, unlike the Ideal Observer, needs additionalsegmentation information that delineate the occluder from the occluded object. Thus, neural mechanisms of object segmentation must, in addition to motion mechanisms that extract information about relative depth, play a crucial role in the perception of relative depth from motion
A topological solution to object segmentation and tracking
The world is composed of objects, the ground, and the sky. Visual perception
of objects requires solving two fundamental challenges: segmenting visual input
into discrete units, and tracking identities of these units despite appearance
changes due to object deformation, changing perspective, and dynamic occlusion.
Current computer vision approaches to segmentation and tracking that approach
human performance all require learning, raising the question: can objects be
segmented and tracked without learning? Here, we show that the mathematical
structure of light rays reflected from environment surfaces yields a natural
representation of persistent surfaces, and this surface representation provides
a solution to both the segmentation and tracking problems. We describe how to
generate this surface representation from continuous visual input, and
demonstrate that our approach can segment and invariantly track objects in
cluttered synthetic video despite severe appearance changes, without requiring
learning.Comment: 21 pages, 6 main figures, 3 supplemental figures, and supplementary
material containing mathematical proof
Neural models of inter-cortical networks in the primate visual system for navigation, attention, path perception, and static and kinetic figure-ground perception
Vision provides the primary means by which many animals distinguish foreground objects from their background and coordinate locomotion through complex environments. The present thesis focuses on mechanisms within the visual system that afford figure-ground segregation and self-motion perception. These processes are modeled as emergent outcomes of dynamical interactions among neural populations in several brain areas. This dissertation specifies and simulates how border-ownership signals emerge in cortex, and how the medial superior temporal area (MSTd) represents path of travel and heading, in the presence of independently moving objects (IMOs).
Neurons in visual cortex that signal border-ownership, the perception that a border belongs to a figure and not its background, have been identified but the underlying mechanisms have been unclear. A model is presented that demonstrates that inter-areal interactions across model visual areas V1-V2-V4 afford border-ownership signals similar to those reported in electrophysiology for visual displays containing figures defined by luminance contrast. Competition between model neurons with different receptive field sizes is crucial for reconciling the occlusion of one object by another. The model is extended to determine border-ownership when object borders are kinetically-defined, and to detect the location and size of shapes, despite the curvature of their boundary contours.
Navigation in the real world requires humans to travel along curved paths. Many perceptual models have been proposed that focus on heading, which specifies the direction of travel along straight paths, but not on path curvature. In primates, MSTd has been implicated in heading perception. A model of V1, medial temporal area (MT), and MSTd is developed herein that demonstrates how MSTd neurons can simultaneously encode path curvature and heading. Human judgments of heading are accurate in rigid environments, but are biased in the presence of IMOs. The model presented here explains the bias through recurrent connectivity in MSTd and avoids the use of differential motion detectors which, although used in existing models to discount the motion of an IMO relative to its background, is not biologically plausible. Reported modulation of the MSTd population due to attention is explained through competitive dynamics between subpopulations responding to bottom-up and top- down signals
Dynamics of Attention in Depth: Evidence from Mutli-Element Tracking
The allocation of attention in depth is examined using a multi-element tracking paradigm. Observers are required to track a predefined subset of from two to eight elements in displays containing up to sixteen identical moving elements. We first show that depth cues, such as binocular disparity and occlusion through T-junctions, improve performance in a multi-element tracking task in the case where element boundaries are allowed to intersect in the depiction of motion in a single fronto-parallel plane. We also show that the allocation of attention across two perceptually distinguishable planar surfaces either fronto-parallel or receding at a slanting angle and defined by coplanar elements, is easier than allocation of attention within a single surface. The same result was not found when attention was required to be deployed across items of two color populations rather than of a single color. Our results suggest that, when surface information does not suffice to distinguish between targets and distractors that are embedded in these surfaces, division of attention across two surfaces aids in tracking moving targets.National Science Foundation (IRI-94-01659); Office of Naval Research (N00014-95-1-0409, N00014-95-1-0657
Kinetic occlusion
Thesis (Elec. E.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 1995.Includes bibliographical references (leaves 58-66).by Sourabh Arun Niyogi.Elec.E
A computer stereo vision system: using horizontal intensity line segments bounded by edges.
by Chor-Tung Yau.Thesis (M.Phil.)--Chinese University of Hong Kong, 1996.Includes bibliographical references (leaves 106-110).Chapter 1 --- Introduction --- p.1Chapter 1.1 --- Objectives --- p.1Chapter 1.2 --- Factors of Depth Perception in Human Visual System --- p.2Chapter 1.2.1 --- Oculomotor Cues --- p.2Chapter 1.2.2 --- Pictorial Cues --- p.3Chapter 1.2.3 --- Movement-Produced Cues --- p.4Chapter 1.2.4 --- Binocular Disparity --- p.5Chapter 1.3 --- What Cues to Use in Computer Vision? --- p.6Chapter 1.4 --- The Process of Stereo Vision --- p.8Chapter 1.4.1 --- Depth and Disparity --- p.8Chapter 1.4.2 --- The Stereo Correspondence Problem --- p.10Chapter 1.4.3 --- Parallel and Nonparallel Axis Stereo Geometry --- p.11Chapter 1.4.4 --- Feature-based and Area-based Stereo Matching --- p.12Chapter 1.4.5 --- Constraints --- p.13Chapter 1.5 --- Organization of this thesis --- p.16Chapter 2 --- Related Work --- p.18Chapter 2.1 --- Marr and Poggio's Computational Theory --- p.18Chapter 2.2 --- Cooperative Methods --- p.19Chapter 2.3 --- Dynamic Programming --- p.21Chapter 2.4 --- Feature-based Methods --- p.24Chapter 2.5 --- Area-based Methods --- p.26Chapter 3 --- Overview of the Method --- p.30Chapter 3.1 --- Considerations --- p.31Chapter 3.2 --- Brief Description of the Method --- p.33Chapter 4 --- Preprocessing of Images --- p.35Chapter 4.1 --- Edge Detection --- p.35Chapter 4.1.1 --- The Laplacian of Gaussian (∇2G) operator --- p.37Chapter 4.1.2 --- The Canny edge detector --- p.40Chapter 4.2 --- Extraction of Horizontal Line Segments for Matching --- p.42Chapter 5 --- The Matching Process --- p.45Chapter 5.1 --- Reducing the Search Space --- p.45Chapter 5.2 --- Similarity Measure --- p.47Chapter 5.3 --- Treating Inclined Surfaces --- p.49Chapter 5.4 --- Ambiguity Caused By Occlusion --- p.51Chapter 5.5 --- Matching Segments of Different Length --- p.53Chapter 5.5.1 --- Cases Without Partial Occlusion --- p.53Chapter 5.5.2 --- Cases With Partial Occlusion --- p.55Chapter 5.5.3 --- Matching Scheme To Handle All the Cases --- p.56Chapter 5.5.4 --- Matching Scheme for Segments of same length --- p.57Chapter 5.6 --- Assigning Disparity Values --- p.58Chapter 5.7 --- Another Case of Partial Occlusion Not Handled --- p.60Chapter 5.8 --- Matching in Two passes --- p.61Chapter 5.8.1 --- Problems encountered in the First pass --- p.61Chapter 5.8.2 --- Second pass of matching --- p.63Chapter 5.9 --- Refinement of Disparity Map --- p.64Chapter 6 --- Coarse-to-fine Matching --- p.67Chapter 6.1 --- The Wavelet Representation --- p.67Chapter 6.2 --- Coarse-to-fine Matching --- p.71Chapter 7 --- Experimental Results and Analysis --- p.74Chapter 7.1 --- Experimental Results --- p.74Chapter 7.1.1 --- Image Pair 1 - The Pentagon Images --- p.74Chapter 7.1.2 --- Image Pair 2 - Random dot stereograms --- p.79Chapter 7.1.3 --- Image Pair 3 ´ؤ The Rubik Block Images --- p.81Chapter 7.1.4 --- Image Pair 4 - The Stack of Books Images --- p.85Chapter 7.1.5 --- Image Pair 5 - The Staple Box Images --- p.87Chapter 7.1.6 --- Image Pair 6 - Circuit Board Image --- p.91Chapter 8 --- Conclusion --- p.94Chapter A --- The Wavelet Transform --- p.96Chapter A.l --- Fourier Transform and Wavelet Transform --- p.96Chapter A.2 --- Continuous wavelet Transform --- p.97Chapter A.3 --- Discrete Time Wavelet Transform --- p.99Chapter B --- Acknowledgements to Testing Images --- p.100Chapter B.l --- The Circuit Board Image --- p.100Chapter B.2 --- The Stack of Books Image --- p.101Chapter B.3 --- The Rubik Block Images --- p.104Bibliography --- p.10
Recommended from our members
Visual perception of solid shape from occluding contours
The relative motion of object and observer induces a motion field in the observer's visual image that is smooth everywhere except along the object's occluding contours. Thus, occluding contours and smooth motion fields can be viewed as complementary and as separate sources of information about an object's shape. I studied how the human visual system perceives solid shape from the occluding contours of rotating objects and from the smooth motion field induced by moving planar surface patches.I propose a three-stage model for the perception of solid shape from the occluding contours of a rotating object. First, the object's motion is determined. I argue that this is only possible using points of correspondence and only when the object's axis of rotation is frontoparallel. In the second stage, the motion field along the contour is used to compute relative depth and surface curvature along the rim, the contour's pre-image. Third, local shape descriptors are propagated inside the figure to yield a global percept of solid shape. To determine which shape descriptors are computed by human subjects, I used a novel task in which subjects have to discriminate between flat ellipses and solid ellipsoids with varying thickness. I found that discriminability is proportional to the inverse of radial curvature but is not proportional to Gaussian or mean curvature. Certain slants of the axis of rotation decrease discriminability. Subjects who could discriminate ellipsoids and ellipses perceived the ellipsoids' angular velocity more veridically than did subjects who could not discriminate the two.Any smooth motion field can locally be described by divergence, curl, and deformation. If the motion field is induced by a rotating plane, the amount of deformation is proportional to the plane's slant and its angular velocity. Similarly, for translating planes, deformation is proportional to slant and image motion. Slant judgments of human observers were to a first-order approximation proportional to deformation per se, that is, observers do not take object motion into account. Recent psychophysical evidence suggests that human subjects need motion discontinuities for this. Thus, contours might be necessary to correctly perceive slant from smooth motion fields
- …