876 research outputs found
Unsupervised Monocular Depth Estimation with Left-Right Consistency
Learning based methods have shown very promising results for the task of
depth estimation in single images. However, most existing approaches treat
depth prediction as a supervised regression problem and as a result, require
vast quantities of corresponding ground truth depth data for training. Just
recording quality depth data in a range of environments is a challenging
problem. In this paper, we innovate beyond existing approaches, replacing the
use of explicit depth data during training with easier-to-obtain binocular
stereo footage.
We propose a novel training objective that enables our convolutional neural
network to learn to perform single image depth estimation, despite the absence
of ground truth depth data. Exploiting epipolar geometry constraints, we
generate disparity images by training our network with an image reconstruction
loss. We show that solving for image reconstruction alone results in poor
quality depth images. To overcome this problem, we propose a novel training
loss that enforces consistency between the disparities produced relative to
both the left and right images, leading to improved performance and robustness
compared to existing approaches. Our method produces state of the art results
for monocular depth estimation on the KITTI driving dataset, even outperforming
supervised methods that have been trained with ground truth depth.Comment: CVPR 2017 ora
Cascaded Scene Flow Prediction using Semantic Segmentation
Given two consecutive frames from a pair of stereo cameras, 3D scene flow
methods simultaneously estimate the 3D geometry and motion of the observed
scene. Many existing approaches use superpixels for regularization, but may
predict inconsistent shapes and motions inside rigidly moving objects. We
instead assume that scenes consist of foreground objects rigidly moving in
front of a static background, and use semantic cues to produce pixel-accurate
scene flow estimates. Our cascaded classification framework accurately models
3D scenes by iteratively refining semantic segmentation masks, stereo
correspondences, 3D rigid motion estimates, and optical flow fields. We
evaluate our method on the challenging KITTI autonomous driving benchmark, and
show that accounting for the motion of segmented vehicles leads to
state-of-the-art performance.Comment: International Conference on 3D Vision (3DV), 2017 (oral presentation
Learning monocular depth estimation with unsupervised trinocular assumptions
Obtaining accurate depth measurements out of a single image represents a
fascinating solution to 3D sensing. CNNs led to considerable improvements in
this field, and recent trends replaced the need for ground-truth labels with
geometry-guided image reconstruction signals enabling unsupervised training.
Currently, for this purpose, state-of-the-art techniques rely on images
acquired with a binocular stereo rig to predict inverse depth (i.e., disparity)
according to the aforementioned supervision principle. However, these methods
suffer from well-known problems near occlusions, left image border, etc
inherited from the stereo setup. Therefore, in this paper, we tackle these
issues by moving to a trinocular domain for training. Assuming the central
image as the reference, we train a CNN to infer disparity representations
pairing such image with frames on its left and right side. This strategy allows
obtaining depth maps not affected by typical stereo artifacts. Moreover, being
trinocular datasets seldom available, we introduce a novel interleaved training
procedure enabling to enforce the trinocular assumption outlined from current
binocular datasets. Exhaustive experimental results on the KITTI dataset
confirm that our proposal outperforms state-of-the-art methods for unsupervised
monocular depth estimation trained on binocular stereo pairs as well as any
known methods relying on other cues.Comment: 14 pages, 7 figures, 4 tables. Accepted to 3DV 201
- …