224 research outputs found

    The perceptual consequences and neural basis of monocular occlusions

    Get PDF
    Occluded areas are abundant in natural scenes and play an important role in stereopsis. However, due to the treatment of occlusions as noise by early researchers of stereopsis, this field of study has not seen much development until the last two decades. Consequently, many aspects of depth perception from occlusions are not well understood. The goal of this thesis was to study several such aspects in order to advance the current understanding of monocular occlusions and their neural underpinnings. The psychophysical and computational studies described in this thesis have demonstrated that: 1) occlusions play an important role in defining the shape and depth of occluding surfaces, 2) depth signals from monocular occlusions and disparity interact in complex ways, 3) there is a single mechanism underlying depth perception from monocular occlusions and 4) this mechanism is likely to rely on monocular occlusion geometry. A unified theory of depth computation from monocular occlusions and disparity was proposed based on these findings. A biologically-plausible computational model based on this theory produced results close to observer percepts for a variety of monocular occlusion phenomena

    Cortical Dynamics of 3-D Surface Perception: Binocular and Half-Occluded Scenic Images

    Full text link
    Previous models of stereopsis have concentrated on the task of binocularly matching left and right eye primitives uniquely. A disparity smoothness constraint is often invoked to limit the number of possible matches. These approaches neglect the fact that surface discontinuities are both abundant in natural everyday scenes, and provide a useful cue for scene segmentation. da Vinci stereopsis refers to the more general problem of dealing with surface discontinuities and their associated unmatched monocular regions within binocular scenes. This study develops a mathematical realization of a neural network theory of biological vision, called FACADE Theory, that shows how early cortical stereopsis processes are related to later cortical processes of 3-D surface representation. The mathematical model demonstrates through computer simulation how the visual cortex may generate 3-D boundary segmentations and use them to control filling-in of 3-D surface properties in response to visual scenes. Model mechanisms correctly match disparate binocular regions while filling-in monocular regions with the correct depth within a binocularly viewed scene. This achievement required introduction of a new multiscale binocular filter for stereo matching which clarifies how cortical complex cells match image contours of like contrast polarity, while pooling signals from opposite contrast polarities. Competitive interactions among filter cells suggest how false binocular matches and unmatched monocular cues, which contain eye-of-origin information, arc automatically handled across multiple spatial scales. This network also helps to explain data concerning context-sensitive binocular matching. Pooling of signals from even-symmetric and odd-symmctric simple cells at complex cells helps to eliminate spurious activity peaks in matchable signals. Later stages of cortical processing by the blob and interblob streams, including refined concepts of cooperative boundary grouping and reciprocal stream interactions between boundary and surface representations, arc modeled to provide a complete simulation of the da Vinci stereopsis percept.Office of Naval Research (N00014-95-I-0409, N00014-85-1-0657, N00014-92-J-4015, N00014-91-J-4100); Airforce Office of Scientific Research (90-0175); National Science Foundation (IRI-90-00530); The James S. McDonnell Foundation (94-40

    Neural Models of Seeing and Thinking

    Full text link
    Air Force Office of Scientific Research (F49620-01-1-0397); Office of Naval Research (N00014-01-1-0624

    Extraction of Surface-Related Features in a Recurrent Model of V1-V2 Interactions

    Get PDF
    Humans can effortlessly segment surfaces and objects from two-dimensional (2D) images that are projections of the 3D world. The projection from 3D to 2D leads partially to occlusions of surfaces depending on their position in depth and on viewpoint. One way for the human visual system to infer monocular depth cues could be to extract and interpret occlusions. It has been suggested that the perception of contour junctions, in particular T-junctions, may be used as cue for occlusion of opaque surfaces. Furthermore, X-junctions could be used to signal occlusion of transparent surfaces.In this contribution, we propose a neural model that suggests how surface-related cues for occlusion can be extracted from a 2D luminance image. The approach is based on feedforward and feedback mechanisms found in visual cortical areas V1 and V2. In a first step, contours are completed over time by generating groupings of like-oriented contrasts. Few iterations of feedforward and feedback processing lead to a stable representation of completed contours and at the same time to a suppression of image noise. In a second step, contour junctions are localized and read out from the distributed representation of boundary groupings. Moreover, surface-related junctions are made explicit such that they are evaluated to interact as to generate surface-segmentations in static images. In addition, we compare our extracted junction signals with a standard computer vision approach for junction detection to demonstrate that our approach outperforms simple feedforward computation-based approaches.A model is proposed that uses feedforward and feedback mechanisms to combine contextually relevant features in order to generate consistent boundary groupings of surfaces. Perceptually important junction configurations are robustly extracted from neural representations to signal cues for occlusion and transparency. Unlike previous proposals which treat localized junction configurations as 2D image features, we link them to mechanisms of apparent surface segregation. As a consequence, we demonstrate how junctions can change their perceptual representation depending on the scene context and the spatial configuration of boundary fragments

    Single-Shot Clothing Category Recognition in Free-Configurations with Application to Autonomous Clothes Sorting

    Get PDF
    This paper proposes a single-shot approach for recognising clothing categories from 2.5D features. We propose two visual features, BSP (B-Spline Patch) and TSD (Topology Spatial Distances) for this task. The local BSP features are encoded by LLC (Locality-constrained Linear Coding) and fused with three different global features. Our visual feature is robust to deformable shapes and our approach is able to recognise the category of unknown clothing in unconstrained and random configurations. We integrated the category recognition pipeline with a stereo vision system, clothing instance detection, and dual-arm manipulators to achieve an autonomous sorting system. To verify the performance of our proposed method, we build a high-resolution RGBD clothing dataset of 50 clothing items of 5 categories sampled in random configurations (a total of 2,100 clothing samples). Experimental results show that our approach is able to reach 83.2\% accuracy while classifying clothing items which were previously unseen during training. This advances beyond the previous state-of-the-art by 36.2\%. Finally, we evaluate the proposed approach in an autonomous robot sorting system, in which the robot recognises a clothing item from an unconstrained pile, grasps it, and sorts it into a box according to its category. Our proposed sorting system achieves reasonable sorting success rates with single-shot perception.Comment: 9 pages, accepted by IROS201

    Real-Time Occlusion Handling in Augmented Reality Based on an Object Tracking Approach

    Get PDF
    To produce a realistic augmentation in Augmented Reality, the correct relative positions of real objects and virtual objects are very important. In this paper, we propose a novel real-time occlusion handling method based on an object tracking approach. Our method is divided into three steps: selection of the occluding object, object tracking and occlusion handling. The user selects the occluding object using an interactive segmentation method. The contour of the selected object is then tracked in the subsequent frames in real-time. In the occlusion handling step, all the pixels on the tracked object are redrawn on the unprocessed augmented image to produce a new synthesized image in which the relative position between the real and virtual object is correct. The proposed method has several advantages. First, it is robust and stable, since it remains effective when the camera is moved through large changes of viewing angles and volumes or when the object and the background have similar colors. Second, it is fast, since the real object can be tracked in real-time. Last, a smoothing technique provides seamless merging between the augmented and virtual object. Several experiments are provided to validate the performance of the proposed method

    Organisation of audio-visual three-dimensional space

    Get PDF
    Le terme stéréopsie renvoie à la sensation de profondeur qui est perçue lorsqu une scène est vue de manière binoculaire. Le système visuel s appuie sur les disparités horizontales entre les images projetées sur les yeux gauche et droit pour calculer une carte des différentes profondeurs présentes dans la scène visuelle. Il est communément admis que le système stéréoscopique est encapsulé et fortement contraint par les connexions neuronales qui s étendent des aires visuelles primaires (V1/V2) aux aires intégratives des voies dorsales et ventrales (V3, cortex temporal inférieur, MT). A travers quatre projets expérimentaux, nous avons étudié comment le système visuel utilise la disparité binoculaire pour calculer la profondeur des objets. Nous avons montré que le traitement de la disparité binoculaire peut être fortement influencé par d autres sources d information telles que l occlusion binoculaire ou le son. Plus précisément, nos résultats expérimentaux suggèrent que : (1) La stéréo de da Vinci est résolue par un mécanisme qui intègre des processus de stéréo classiques (double fusion), des contraintes géométriques (les objets monoculaires sont nécessairement cachés à un œil, par conséquent ils sont situés derrière le plan de l objet caché) et des connaissances à priori (une préférence pour les faibles disparités). (2) Le traitement du mouvement en profondeur peut être influencé par une information auditive : un son temporellement corrélé avec une cible définie par le mouvement stéréo peut améliorer significativement la recherche visuelle. Les détecteurs de mouvement stéréo sont optimalement adaptés pour détecter le mouvement 3D mais peu adaptés pour traiter le mouvement 2D. (3) Grouper la disparité binoculaire avec un signal auditif dans une dimension orthogonale (hauteur tonale) peut améliorer l acuité stéréo d approximativement 30%Stereopsis refers the perception of depth that arises when a scene is viewed binocularly. The visual system relies on the horizontal disparities between the images from the left and right eyes to compute a map of the different depth values present in the scene. It is usually thought that the stereoscopic system is encapsulated and highly constrained by the wiring of neurons from the primary visual areas (V1/V2) to higher integrative areas in the ventral and dorsal streams (V3, inferior temporal cortex, MT). Throughout four distinct experimental projects, we investigated how the visual system makes use of binocular disparity to compute the depth of objects. In summary, we show that the processing of binocular disparity can be substantially influenced by other types of information such as binocular occlusion or sound. In more details, our experimental results suggest that: (1) da Vinci stereopsis is solved by a mechanism that integrates classic stereoscopic processes (double fusion), geometrical constraints (monocular objects are necessarily hidden to one eye, therefore they are located behind the plane of the occluder) and prior information (a preference for small disparities). (2) The processing of motion-in-depth can be influenced by auditory information: a sound that is temporally correlated with a stereomotion defined target can substantially improve visual search. Stereomotion detectors are optimally suited to track 3D motion but poorly suited to process 2D motion. (3) Grouping binocular disparity with an orthogonal auditory signal (pitch) can increase stereoacuity by approximately 30%PARIS5-Bibliotheque electronique (751069902) / SudocSudocFranceF

    Cortical Dynamics of 3-D Figure-Ground Perception of 2-D Pictures

    Full text link
    This article develops the FACADE theory of 3-D vision and figure-ground separation to explain data concerning how 2-D pictures give rise to 3-D percepts of occluding and occluded objects. These percepts include pop-out of occluding figures and amodal completion of occluded figures in response to line drawings, to Bregman-Kanizsa displays in which the relative contrasts of occluding and occluded surfaces are reversed, to White displays from which either transparent or opaque occlusion percepts can obtain, to Egusa and Kanizsa square displays in which brighter regions look closer, and to Kanizsa stratification displays in which bistable reversals of occluding and occluded surfaces occurs, and in which real contours and illusory contours compete to alter the reversal percept. The model describes how changes in contrast can alter a percept without a change in geometry, and conversely. More generally it shows how geometrical and contrastive properties of a picture can either cooperate or compete when forming the boundaries and surface representations that subserve conscious percepts. Spatially long-range cooperation and spatially short-range competition work together to separate the boundaries of occluding figures from their occluded neighbors. This boundary ownership process is sensitive to image T-junctions at which occluded figures contact occluding figures, but there are no explicit T-junction detectors in the network. Rather, the contextual balance of boundary cooperation and competition strengthens some boundaries while breaking others. These boundaries control the filling-in of color within multiple, depth-sensitive surface respresentations. Feedback between surface and boundary representations strengthens consistent boundaries while inhibiting inconsistent ones. It is suggested how both the boundary and the surface representations of occluded objects may be amodally completed, even while the surface representations of unocclucled objects become visible through modal completion. Distinct functional roles for conscious modal and amodal representations in object recognition, spatial attention, and reaching behaviors are discussed. Model interactions are interpreted in terms of visual, temporal, and parietal cortex. Model concepts provide a mechanistic neural explanation and revision of such Gestalt principles as good continuation, stratification, and non-accidental solution.Office of Naval Research (N00014-91-J-4100, N00014-95-I-0409, N00014-95-I-0657, N00014-92-J-11015

    A Person Following Algorithm for Use with a Single Forward Facing RGB-D Camera on a Mobile Robot

    Get PDF
    This thesis examines the problem of person following. A person following algorithm can be separated into two distinct parts: the detection and tracking of a target and the actual following of a target. This thesis focuses mainly on the detection and tracking of a target person. For the purposes of this thesis a simple robot control architecture is used. The robot moves to follow the target in a straight line. No path planning is considered when executing robot movement. This thesis aims to accomplish three tasks. First, the system should be able to track and follow a target when no occlusions occur. The non-occlusion scenarios should consider the target in environments with no other people, environments with other people present at different distances, and environments with other people present at similar distances. The second goal will be to track the target person through brief occlusions. The system should be able to detect when the target has been occluded, register the occlusion, and reacquire the target upon completion of the occlusion. The third and final goal of this thesis is to reacquire the target after a long term occlusion. The system must recognize that the target person has disappeared from the scene, wait for the target to reappear, and reacquire the target upon reappearance. These goals will be accomplished using a generic person detector realized by a HOG person detector, a specific appearance model based on color histograms, a particle filter that will serve as an integrating structure for the tracker, and a simplistic robot control architecture. In the following chapters I will discuss the motivation behind this work, previous research done in this area, the methods used in this thesis and the theory behind them. Experimental results will then be analyzed and discussion concerning the results and possible improvements to the system will be presented
    • …
    corecore