2,383 research outputs found
Depth Prediction Without the Sensors: Leveraging Structure for Unsupervised Learning from Monocular Videos
Learning to predict scene depth from RGB inputs is a challenging task both
for indoor and outdoor robot navigation. In this work we address unsupervised
learning of scene depth and robot ego-motion where supervision is provided by
monocular videos, as cameras are the cheapest, least restrictive and most
ubiquitous sensor for robotics.
Previous work in unsupervised image-to-depth learning has established strong
baselines in the domain. We propose a novel approach which produces higher
quality results, is able to model moving objects and is shown to transfer
across data domains, e.g. from outdoors to indoor scenes. The main idea is to
introduce geometric structure in the learning process, by modeling the scene
and the individual objects; camera ego-motion and object motions are learned
from monocular videos as input. Furthermore an online refinement method is
introduced to adapt learning on the fly to unknown domains.
The proposed approach outperforms all state-of-the-art approaches, including
those that handle motion e.g. through learned flow. Our results are comparable
in quality to the ones which used stereo as supervision and significantly
improve depth prediction on scenes and datasets which contain a lot of object
motion. The approach is of practical relevance, as it allows transfer across
environments, by transferring models trained on data collected for robot
navigation in urban scenes to indoor navigation settings. The code associated
with this paper can be found at https://sites.google.com/view/struct2depth.Comment: Thirty-Third AAAI Conference on Artificial Intelligence (AAAI'19
LEGO: Learning Edge with Geometry all at Once by Watching Videos
Learning to estimate 3D geometry in a single image by watching unlabeled
videos via deep convolutional network is attracting significant attention. In
this paper, we introduce a "3D as-smooth-as-possible (3D-ASAP)" prior inside
the pipeline, which enables joint estimation of edges and 3D scene, yielding
results with significant improvement in accuracy for fine detailed structures.
Specifically, we define the 3D-ASAP prior by requiring that any two points
recovered in 3D from an image should lie on an existing planar surface if no
other cues provided. We design an unsupervised framework that Learns Edges and
Geometry (depth, normal) all at Once (LEGO). The predicted edges are embedded
into depth and surface normal smoothness terms, where pixels without edges
in-between are constrained to satisfy the prior. In our framework, the
predicted depths, normals and edges are forced to be consistent all the time.
We conduct experiments on KITTI to evaluate our estimated geometry and
CityScapes to perform edge evaluation. We show that in all of the tasks,
i.e.depth, normal and edge, our algorithm vastly outperforms other
state-of-the-art (SOTA) algorithms, demonstrating the benefits of our approach.Comment: Accepted to CVPR 2018 as spotlight; Camera ready plus supplementary
material. Code will com
- …