75,244 research outputs found
Unsupervised Odometry and Depth Learning for Endoscopic Capsule Robots
In the last decade, many medical companies and research groups have tried to
convert passive capsule endoscopes as an emerging and minimally invasive
diagnostic technology into actively steerable endoscopic capsule robots which
will provide more intuitive disease detection, targeted drug delivery and
biopsy-like operations in the gastrointestinal(GI) tract. In this study, we
introduce a fully unsupervised, real-time odometry and depth learner for
monocular endoscopic capsule robots. We establish the supervision by warping
view sequences and assigning the re-projection minimization to the loss
function, which we adopt in multi-view pose estimation and single-view depth
estimation network. Detailed quantitative and qualitative analyses of the
proposed framework performed on non-rigidly deformable ex-vivo porcine stomach
datasets proves the effectiveness of the method in terms of motion estimation
and depth recovery.Comment: submitted to IROS 201
Learning sparse representations of depth
This paper introduces a new method for learning and inferring sparse
representations of depth (disparity) maps. The proposed algorithm relaxes the
usual assumption of the stationary noise model in sparse coding. This enables
learning from data corrupted with spatially varying noise or uncertainty,
typically obtained by laser range scanners or structured light depth cameras.
Sparse representations are learned from the Middlebury database disparity maps
and then exploited in a two-layer graphical model for inferring depth from
stereo, by including a sparsity prior on the learned features. Since they
capture higher-order dependencies in the depth structure, these priors can
complement smoothness priors commonly used in depth inference based on Markov
Random Field (MRF) models. Inference on the proposed graph is achieved using an
alternating iterative optimization technique, where the first layer is solved
using an existing MRF-based stereo matching algorithm, then held fixed as the
second layer is solved using the proposed non-stationary sparse coding
algorithm. This leads to a general method for improving solutions of state of
the art MRF-based depth estimation algorithms. Our experimental results first
show that depth inference using learned representations leads to state of the
art denoising of depth maps obtained from laser range scanners and a time of
flight camera. Furthermore, we show that adding sparse priors improves the
results of two depth estimation methods: the classical graph cut algorithm by
Boykov et al. and the more recent algorithm of Woodford et al.Comment: 12 page
DepthCut: Improved Depth Edge Estimation Using Multiple Unreliable Channels
In the context of scene understanding, a variety of methods exists to
estimate different information channels from mono or stereo images, including
disparity, depth, and normals. Although several advances have been reported in
the recent years for these tasks, the estimated information is often imprecise
particularly near depth discontinuities or creases. Studies have however shown
that precisely such depth edges carry critical cues for the perception of
shape, and play important roles in tasks like depth-based segmentation or
foreground selection. Unfortunately, the currently extracted channels often
carry conflicting signals, making it difficult for subsequent applications to
effectively use them. In this paper, we focus on the problem of obtaining
high-precision depth edges (i.e., depth contours and creases) by jointly
analyzing such unreliable information channels. We propose DepthCut, a
data-driven fusion of the channels using a convolutional neural network trained
on a large dataset with known depth. The resulting depth edges can be used for
segmentation, decomposing a scene into depth layers with relatively flat depth,
or improving the accuracy of the depth estimate near depth edges by
constraining its gradients to agree with these edges. Quantitatively, we
compare against 15 variants of baselines and demonstrate that our depth edges
result in an improved segmentation performance and an improved depth estimate
near depth edges compared to data-agnostic channel fusion. Qualitatively, we
demonstrate that the depth edges result in superior segmentation and depth
orderings.Comment: 12 page
Entropy-difference based stereo error detection
Stereo depth estimation is error-prone; hence, effective error detection
methods are desirable. Most such existing methods depend on characteristics of
the stereo matching cost curve, making them unduly dependent on functional
details of the matching algorithm. As a remedy, we propose a novel error
detection approach based solely on the input image and its depth map. Our
assumption is that, entropy of any point on an image will be significantly
higher than the entropy of its corresponding point on the image's depth map. In
this paper, we propose a confidence measure, Entropy-Difference (ED) for stereo
depth estimates and a binary classification method to identify incorrect
depths. Experiments on the Middlebury dataset show the effectiveness of our
method. Our proposed stereo confidence measure outperforms 17 existing measures
in all aspects except occlusion detection. Established metrics such as
precision, accuracy, recall, and area-under-curve are used to demonstrate the
effectiveness of our method
Robust temporal depth enhancement method for dynamic virtual view synthesis
Depth-image-based rendering (DIBR) is a view synthesis technique that generates virtual views by warping from
the reference images based on depth maps. The quality of synthesized views highly depends on the accuracy of
depth maps. However, for dynamic scenarios, depth sequences obtained through stereo matching methods frame
by frame can be temporally inconsistent, especially in static regions, which leads to uncomfortable flickering
artifacts in synthesized videos. This problem can be eliminated by depth enhancement methods that perform
temporal filtering to suppress depth inconsistency, yet those methods may also spread depth errors. Although these
depth enhancement algorithms increase the temporal consistency of synthesized videos, they have the risk of
reducing the quality of rendered videos. Since conventional methods may not achieve both properties, in this paper,
we present for static regions a robust temporal depth enhancement (RTDE) method, which propagates exactly the
reliable depth values into succeeding frames to upgrade not only the accuracy but also the temporal consistency
of depth estimations. This technique benefits the quality of synthesized videos. In addition we propose a novel
evaluation metric to quantitatively compare temporal consistency between our method and the state of arts.
Experimental results demonstrate the robustness of our method for dynamic virtual view synthesis, not only the
temporal consistency but also the quality of synthesized videos in static regions are improved
- …