592 research outputs found
Accurate Light Field Depth Estimation with Superpixel Regularization over Partially Occluded Regions
Depth estimation is a fundamental problem for light field photography
applications. Numerous methods have been proposed in recent years, which either
focus on crafting cost terms for more robust matching, or on analyzing the
geometry of scene structures embedded in the epipolar-plane images. Significant
improvements have been made in terms of overall depth estimation error;
however, current state-of-the-art methods still show limitations in handling
intricate occluding structures and complex scenes with multiple occlusions. To
address these challenging issues, we propose a very effective depth estimation
framework which focuses on regularizing the initial label confidence map and
edge strength weights. Specifically, we first detect partially occluded
boundary regions (POBR) via superpixel based regularization. Series of
shrinkage/reinforcement operations are then applied on the label confidence map
and edge strength weights over the POBR. We show that after weight
manipulations, even a low-complexity weighted least squares model can produce
much better depth estimation than state-of-the-art methods in terms of average
disparity error rate, occlusion boundary precision-recall rate, and the
preservation of intricate visual features
Temporally coherent 4D reconstruction of complex dynamic scenes
This paper presents an approach for reconstruction of 4D temporally coherent
models of complex dynamic scenes. No prior knowledge is required of scene
structure or camera calibration allowing reconstruction from multiple moving
cameras. Sparse-to-dense temporal correspondence is integrated with joint
multi-view segmentation and reconstruction to obtain a complete 4D
representation of static and dynamic objects. Temporal coherence is exploited
to overcome visual ambiguities resulting in improved reconstruction of complex
scenes. Robust joint segmentation and reconstruction of dynamic objects is
achieved by introducing a geodesic star convexity constraint. Comparative
evaluation is performed on a variety of unstructured indoor and outdoor dynamic
scenes with hand-held cameras and multiple people. This demonstrates
reconstruction of complete temporally coherent 4D scene models with improved
nonrigid object segmentation and shape reconstruction.Comment: To appear in The IEEE Conference on Computer Vision and Pattern
Recognition (CVPR) 2016 . Video available at:
https://www.youtube.com/watch?v=bm_P13_-Ds
Real-Time Occlusion Handling in Augmented Reality Based on an Object Tracking Approach
To produce a realistic augmentation in Augmented Reality, the correct relative positions of real objects and virtual objects are very important. In this paper, we propose a novel real-time occlusion handling method based on an object tracking approach. Our method is divided into three steps: selection of the occluding object, object tracking and occlusion handling. The user selects the occluding object using an interactive segmentation method. The contour of the selected object is then tracked in the subsequent frames in real-time. In the occlusion handling step, all the pixels on the tracked object are redrawn on the unprocessed augmented image to produce a new synthesized image in which the relative position between the real and virtual object is correct. The proposed method has several advantages. First, it is robust and stable, since it remains effective when the camera is moved through large changes of viewing angles and volumes or when the object and the background have similar colors. Second, it is fast, since the real object can be tracked in real-time. Last, a smoothing technique provides seamless merging between the augmented and virtual object. Several experiments are provided to validate the performance of the proposed method
Perception of depth and motion from ambiguous binocular information
AbstractThe visual system can determine motion and depth from ambiguous information contained in images projected onto both retinas over space and time. The key to the way the system overcomes such ambiguity lies in dependency among multiple cues—such as spatial displacement over time, binocular disparity, and interocular time delay—which might be established based on prior knowledge or experience, and stored in spatiotemporal response characteristics of neurons at an early cortical stage. We conducted a psychophysical investigation of whether a single ambiguous cue (specifically, interocular time delay) permits depth discrimination and motion perception. Data from this investigation are consistent with the predictions derived from the response profiles of V1 neurons, which show interdependency in their responses to each cue, indicating that spatial and temporal information is jointly encoded in early vision
The perceptual consequences and neural basis of monocular occlusions
Occluded areas are abundant in natural scenes and play an important role in stereopsis. However, due to the treatment of occlusions as noise by early researchers of stereopsis, this field of study has not seen much development until the last two decades. Consequently, many aspects of depth perception from occlusions are not well understood. The goal of this thesis was to study several such aspects in order to advance the current understanding of monocular occlusions and their neural underpinnings. The psychophysical and computational studies described in this thesis have demonstrated that: 1) occlusions play an important role in defining the shape and depth of occluding surfaces, 2) depth signals from monocular occlusions and disparity interact in complex ways, 3) there is a single mechanism underlying depth perception from monocular occlusions and 4) this mechanism is likely to rely on monocular occlusion geometry. A unified theory of depth computation from monocular occlusions and disparity was proposed based on these findings. A biologically-plausible computational model based on this theory produced results close to observer percepts for a variety of monocular occlusion phenomena
Dense Wide-Baseline Stereo with Varying Illumination and its Application to Face Recognition
We study the problem of dense wide baseline stereo with varying illumination. We
are motivated by the problem of face recognition across pose. Stereo matching
allows us to compare face images based on physically valid, dense
correspondences. We show that the stereo matching cost provides a very robust
measure of the similarity of faces that is insensitive to pose variations. We
build on the observation that most illumination insensitive local comparisons
require the use of relatively large windows. The size of these windows is
affected by foreshortening. If we do not account for this effect, we incur
misalignments that are systematic and significant and are exacerbated by wide
baseline conditions.
We present a general formulation of dense wide baseline stereo with varying
illumination and provide two methods to solve them. The first method is based on
dynamic programming (DP) and fully accounts for the effect of slant. The second
method is based on graph cuts (GC) and fully accounts for the effect of both slant
and tilt. The GC method finds a global solution using the unary function from
the general formulation and a novel smoothness term that encodes surface
orientation.
Our experiments show that DP dense wide baseline stereo achieves superior
performance compared to existing methods in face recognition across pose. The
experiments with the GC method show that accounting for both slant and tilt can
improve performance in situations with wide baselines and lighting variation.
Our formulation can be applied to other more sophisticated window based image
comparison methods for stereo
Filling-in the Forms: Surface and Boundary Interactions in Visual Cortex
Defense Advanced Research Projects Agency and the Office of Naval Research (NOOOI4-95-l-0409); Office of Naval Research (NOOO14-95-1-0657)
Fast and Accurate Depth Estimation from Sparse Light Fields
We present a fast and accurate method for dense depth reconstruction from
sparsely sampled light fields obtained using a synchronized camera array. In
our method, the source images are over-segmented into non-overlapping compact
superpixels that are used as basic data units for depth estimation and
refinement. Superpixel representation provides a desirable reduction in the
computational cost while preserving the image geometry with respect to the
object contours. Each superpixel is modeled as a plane in the image space,
allowing depth values to vary smoothly within the superpixel area. Initial
depth maps, which are obtained by plane sweeping, are iteratively refined by
propagating good correspondences within an image. To ensure the fast
convergence of the iterative optimization process, we employ a highly parallel
propagation scheme that operates on all the superpixels of all the images at
once, making full use of the parallel graphics hardware. A few optimization
iterations of the energy function incorporating superpixel-wise smoothness and
geometric consistency constraints allows to recover depth with high accuracy in
textured and textureless regions as well as areas with occlusions, producing
dense globally consistent depth maps. We demonstrate that while the depth
reconstruction takes about a second per full high-definition view, the accuracy
of the obtained depth maps is comparable with the state-of-the-art results.Comment: 15 pages, 15 figure
- …