23 research outputs found
Finding Temporally Consistent Occlusion Boundaries in Videos using Geometric Context
We present an algorithm for finding temporally consistent occlusion
boundaries in videos to support segmentation of dynamic scenes. We learn
occlusion boundaries in a pairwise Markov random field (MRF) framework. We
first estimate the probability of an spatio-temporal edge being an occlusion
boundary by using appearance, flow, and geometric features. Next, we enforce
occlusion boundary continuity in a MRF model by learning pairwise occlusion
probabilities using a random forest. Then, we temporally smooth boundaries to
remove temporal inconsistencies in occlusion boundary estimation. Our proposed
framework provides an efficient approach for finding temporally consistent
occlusion boundaries in video by utilizing causality, redundancy in videos, and
semantic layout of the scene. We have developed a dataset with fully annotated
ground-truth occlusion boundaries of over 30 videos ($5000 frames). This
dataset is used to evaluate temporal occlusion boundaries and provides a much
needed baseline for future studies. We perform experiments to demonstrate the
role of scene layout, and temporal information for occlusion reasoning in
dynamic scenes.Comment: Applications of Computer Vision (WACV), 2015 IEEE Winter Conference
o
Unsupervised Monocular Depth Estimation with Left-Right Consistency
Learning based methods have shown very promising results for the task of
depth estimation in single images. However, most existing approaches treat
depth prediction as a supervised regression problem and as a result, require
vast quantities of corresponding ground truth depth data for training. Just
recording quality depth data in a range of environments is a challenging
problem. In this paper, we innovate beyond existing approaches, replacing the
use of explicit depth data during training with easier-to-obtain binocular
stereo footage.
We propose a novel training objective that enables our convolutional neural
network to learn to perform single image depth estimation, despite the absence
of ground truth depth data. Exploiting epipolar geometry constraints, we
generate disparity images by training our network with an image reconstruction
loss. We show that solving for image reconstruction alone results in poor
quality depth images. To overcome this problem, we propose a novel training
loss that enforces consistency between the disparities produced relative to
both the left and right images, leading to improved performance and robustness
compared to existing approaches. Our method produces state of the art results
for monocular depth estimation on the KITTI driving dataset, even outperforming
supervised methods that have been trained with ground truth depth.Comment: CVPR 2017 ora
Occlusion-Aware Depth Estimation with Adaptive Normal Constraints
We present a new learning-based method for multi-frame depth estimation from
a color video, which is a fundamental problem in scene understanding, robot
navigation or handheld 3D reconstruction. While recent learning-based methods
estimate depth at high accuracy, 3D point clouds exported from their depth maps
often fail to preserve important geometric feature (e.g., corners, edges,
planes) of man-made scenes. Widely-used pixel-wise depth errors do not
specifically penalize inconsistency on these features. These inaccuracies are
particularly severe when subsequent depth reconstructions are accumulated in an
attempt to scan a full environment with man-made objects with this kind of
features. Our depth estimation algorithm therefore introduces a Combined Normal
Map (CNM) constraint, which is designed to better preserve high-curvature
features and global planar regions. In order to further improve the depth
estimation accuracy, we introduce a new occlusion-aware strategy that
aggregates initial depth predictions from multiple adjacent views into one
final depth map and one occlusion probability map for the current reference
view. Our method outperforms the state-of-the-art in terms of depth estimation
accuracy, and preserves essential geometric features of man-made indoor scenes
much better than other algorithms.Comment: ECCV 202
Virtual Occlusions Through Implicit Depth
For augmented reality (AR), it is important that virtual assets appear to 'sit among' real world objects. The virtual element should variously occlude and be occluded by real matter, based on a plausible depth ordering. This occlusion should be consistent over time as the viewer's camera moves. Unfortunately, small mistakes in the estimated scene depth can ruin the downstream occlusion mask, and thereby the AR illusion. Especially in real-time settings, depths inferred near boundaries or across time can be inconsistent. In this paper, we challenge the need for depth-regression as an intermediate step. We instead propose an implicit model for depth and use that to predict the occlusion mask directly. The inputs to our network are one or more color images, plus the known depths of any virtual geometry. We show how our occlusion predictions are more accurate and more temporally stable than predictions derived from traditional depth-estimation models. We obtain state-of-the-art occlusion results on the challenging ScanNetv2 dataset and superior qualitative results on real scenes
Depth Extraction from Videos Using Geometric Context and Occlusion Boundaries
© 2014. The copyright of this document resides with its authors.
It may be distributed unchanged freely in print or electronic formsWe present an algorithm to estimate depth in dynamic video scenes. We propose to learn and infer depth in videos from appearance, motion, occlusion boundaries, and geometric context of the scene. Using our method, depth can be estimated from unconstrained videos with no requirement of camera pose estimation, and with significant background/foreground motions. We start by decomposing a video into spatio-temporal regions. For each spatio-temporal region, we learn the relationship of depth to visual appearance, motion, and geometric classes. Then we infer the depth information of new scenes using piecewise planar parametrization estimated within a Markov random field
(MRF) framework by combining appearance to depth learned mappings and occlusion
boundary guided smoothness constraints. Subsequently, we perform temporal smoothing to obtain temporally consistent depth maps. To evaluate our depth estimation algorithm,
we provide a novel dataset with ground truth depth for outdoor video scenes. We present
a thorough evaluation of our algorithm on our new dataset and the publicly available
Make3d static image dataset