12,017 research outputs found
DeMoN: Depth and Motion Network for Learning Monocular Stereo
In this paper we formulate structure from motion as a learning problem. We
train a convolutional network end-to-end to compute depth and camera motion
from successive, unconstrained image pairs. The architecture is composed of
multiple stacked encoder-decoder networks, the core part being an iterative
network that is able to improve its own predictions. The network estimates not
only depth and motion, but additionally surface normals, optical flow between
the images and confidence of the matching. A crucial component of the approach
is a training loss based on spatial relative differences. Compared to
traditional two-frame structure from motion methods, results are more accurate
and more robust. In contrast to the popular depth-from-single-image networks,
DeMoN learns the concept of matching and, thus, better generalizes to
structures not seen during training.Comment: Camera ready version for CVPR 2017. Supplementary material included.
Project page:
http://lmb.informatik.uni-freiburg.de/people/ummenhof/depthmotionnet
Unsupervised Learning of Edges
Data-driven approaches for edge detection have proven effective and achieve
top results on modern benchmarks. However, all current data-driven edge
detectors require manual supervision for training in the form of hand-labeled
region segments or object boundaries. Specifically, human annotators mark
semantically meaningful edges which are subsequently used for training. Is this
form of strong, high-level supervision actually necessary to learn to
accurately detect edges? In this work we present a simple yet effective
approach for training edge detectors without human supervision. To this end we
utilize motion, and more specifically, the only input to our method is noisy
semi-dense matches between frames. We begin with only a rudimentary knowledge
of edges (in the form of image gradients), and alternate between improving
motion estimation and edge detection in turn. Using a large corpus of video
data, we show that edge detectors trained using our unsupervised scheme
approach the performance of the same methods trained with full supervision
(within 3-5%). Finally, we show that when using a deep network for the edge
detector, our approach provides a novel pre-training scheme for object
detection.Comment: Camera ready version for CVPR 201
Object-based 2D-to-3D video conversion for effective stereoscopic content generation in 3D-TV applications
Three-dimensional television (3D-TV) has gained increasing popularity in the broadcasting domain, as it enables enhanced viewing experiences in comparison to conventional two-dimensional (2D) TV. However, its application has been constrained due to the lack of essential contents, i.e., stereoscopic videos. To alleviate such content shortage, an economical and practical solution is to reuse the huge media resources that are available in monoscopic 2D and convert them to stereoscopic 3D. Although stereoscopic video can be generated from monoscopic sequences using depth measurements extracted from cues like focus blur, motion and size, the quality of the resulting video may be poor as such measurements are usually arbitrarily defined and appear inconsistent with the real scenes. To help solve this problem, a novel method for object-based stereoscopic video generation is proposed which features i) optical-flow based occlusion reasoning in determining depth ordinal, ii) object segmentation using improved region-growing from masks of determined depth layers, and iii) a hybrid depth estimation scheme using content-based matching (inside a small library of true stereo image pairs) and depth-ordinal based regularization. Comprehensive experiments have validated the effectiveness of our proposed 2D-to-3D conversion method in generating stereoscopic videos of consistent depth measurements for 3D-TV applications
- …