889 research outputs found
Unsupervised Monocular Depth Estimation with Left-Right Consistency
Learning based methods have shown very promising results for the task of
depth estimation in single images. However, most existing approaches treat
depth prediction as a supervised regression problem and as a result, require
vast quantities of corresponding ground truth depth data for training. Just
recording quality depth data in a range of environments is a challenging
problem. In this paper, we innovate beyond existing approaches, replacing the
use of explicit depth data during training with easier-to-obtain binocular
stereo footage.
We propose a novel training objective that enables our convolutional neural
network to learn to perform single image depth estimation, despite the absence
of ground truth depth data. Exploiting epipolar geometry constraints, we
generate disparity images by training our network with an image reconstruction
loss. We show that solving for image reconstruction alone results in poor
quality depth images. To overcome this problem, we propose a novel training
loss that enforces consistency between the disparities produced relative to
both the left and right images, leading to improved performance and robustness
compared to existing approaches. Our method produces state of the art results
for monocular depth estimation on the KITTI driving dataset, even outperforming
supervised methods that have been trained with ground truth depth.Comment: CVPR 2017 ora
Distilled Visual and Robot Kinematics Embeddings for Metric Depth Estimation in Monocular Scene Reconstruction
Estimating precise metric depth and scene reconstruction from monocular
endoscopy is a fundamental task for surgical navigation in robotic surgery.
However, traditional stereo matching adopts binocular images to perceive the
depth information, which is difficult to transfer to the soft robotics-based
surgical systems due to the use of monocular endoscopy. In this paper, we
present a novel framework that combines robot kinematics and monocular
endoscope images with deep unsupervised learning into a single network for
metric depth estimation and then achieve 3D reconstruction of complex anatomy.
Specifically, we first obtain the relative depth maps of surgical scenes by
leveraging a brightness-aware monocular depth estimation method. Then, the
corresponding endoscope poses are computed based on non-linear optimization of
geometric and photometric reprojection residuals. Afterwards, we develop a
Depth-driven Sliding Optimization (DDSO) algorithm to extract the scaling
coefficient from kinematics and calculated poses offline. By coupling the
metric scale and relative depth data, we form a robust ensemble that represents
the metric and consistent depth. Next, we treat the ensemble as supervisory
labels to train a metric depth estimation network for surgeries (i.e.,
MetricDepthS-Net) that distills the embeddings from the robot kinematics,
endoscopic videos, and poses. With accurate metric depth estimation, we utilize
a dense visual reconstruction method to recover the 3D structure of the whole
surgical site. We have extensively evaluated the proposed framework on public
SCARED and achieved comparable performance with stereo-based depth estimation
methods. Our results demonstrate the feasibility of the proposed approach to
recover the metric depth and 3D structure with monocular inputs
Real-time Halfway Domain Reconstruction of Motion and Geometry
We present a novel approach for real-time joint reconstruction of 3D scene motion and geometry from binocular stereo videos. Our approach is based on a novel variational halfway-domain scene flow formulation, which allows us to obtain highly accurate spatiotemporal reconstructions of shape and motion. We solve the underlying optimization problem at real-time frame rates using a novel data-parallel robust non-linear optimization strategy. Fast convergence and large displacement flows are achieved by employing a novel hierarchy that stores delta flows between hierarchy levels. High performance is obtained by the introduction of a coarser warp grid that decouples the number of unknowns from the input resolution of the images. We demonstrate our approach in a live setup that is based on two commodity webcams, as well as on publicly available video data. Our extensive experiments and evaluations show that our approach produces high-quality dense reconstructions of 3D geometry and scene flow at real-time frame rates, and compares favorably to the state of the art
One at A Time: Multi-step Volumetric Probability Distribution Diffusion for Depth Estimation
Recent works have explored the fundamental role of depth estimation in
multi-view stereo (MVS) and semantic scene completion (SSC). They generally
construct 3D cost volumes to explore geometric correspondence in depth, and
estimate such volumes in a single step relying directly on the ground truth
approximation. However, such problem cannot be thoroughly handled in one step
due to complex empirical distributions, especially in challenging regions like
occlusions, reflections, etc. In this paper, we formulate the depth estimation
task as a multi-step distribution approximation process, and introduce a new
paradigm of modeling the Volumetric Probability Distribution progressively
(step-by-step) following a Markov chain with Diffusion models (VPDD).
Specifically, to constrain the multi-step generation of volume in VPDD, we
construct a meta volume guidance and a confidence-aware contextual guidance as
conditional geometry priors to facilitate the distribution approximation. For
the sampling process, we further investigate an online filtering strategy to
maintain consistency in volume representations for stable training. Experiments
demonstrate that our plug-and-play VPDD outperforms the state-of-the-arts for
tasks of MVS and SSC, and can also be easily extended to different baselines to
get improvement. It is worth mentioning that we are the first camera-based work
that surpasses LiDAR-based methods on the SemanticKITTI dataset
- …