3 research outputs found
Self-Supervised Learning of Depth and Motion Under Photometric Inconsistency
The self-supervised learning of depth and pose from monocular sequences
provides an attractive solution by using the photometric consistency of nearby
frames as it depends much less on the ground-truth data. In this paper, we
address the issue when previous assumptions of the self-supervised approaches
are violated due to the dynamic nature of real-world scenes. Different from
handling the noise as uncertainty, our key idea is to incorporate more robust
geometric quantities and enforce internal consistency in the temporal image
sequence. As demonstrated on commonly used benchmark datasets, the proposed
method substantially improves the state-of-the-art methods on both depth and
relative pose estimation for monocular image sequences, without adding
inference overhead.Comment: International Conference on Computer Vision (ICCV) Workshop 201
M^3VSNet: Unsupervised Multi-metric Multi-view Stereo Network
The present Multi-view stereo (MVS) methods with supervised learning-based
networks have an impressive performance comparing with traditional MVS methods.
However, the ground-truth depth maps for training are hard to be obtained and
are within limited kinds of scenarios. In this paper, we propose a novel
unsupervised multi-metric MVS network, named M^3VSNet, for dense point cloud
reconstruction without any supervision. To improve the robustness and
completeness of point cloud reconstruction, we propose a novel multi-metric
loss function that combines pixel-wise and feature-wise loss function to learn
the inherent constraints from different perspectives of matching
correspondences. Besides, we also incorporate the normal-depth consistency in
the 3D point cloud format to improve the accuracy and continuity of the
estimated depth maps. Experimental results show that M3VSNet establishes the
state-of-the-arts unsupervised method and achieves comparable performance with
previous supervised MVSNet on the DTU dataset and demonstrates the powerful
generalization ability on the Tanks and Temples benchmark with effective
improvement. Our code is available at https://github.com/whubaichuan/M3VSNetComment: The original top-level version is arXiv:2004.09722v2 but I upload the
similar version to arXiv:2005.00363 mistakenly, which is overlapped with
arXiv:2004.09722v2. So the submission is to make the two addresses keeping
the same versio
MVP: Unified Motion and Visual Self-Supervised Learning for Large-Scale Robotic Navigation
Autonomous navigation emerges from both motion and local visual perception in
real-world environments. However, most successful robotic motion estimation
methods (e.g. VO, SLAM, SfM) and vision systems (e.g. CNN, visual place
recognition-VPR) are often separately used for mapping and localization tasks.
Conversely, recent reinforcement learning (RL) based methods for visual
navigation rely on the quality of GPS data reception, which may not be reliable
when directly using it as ground truth across multiple, month-spaced traversals
in large environments. In this paper, we propose a novel motion and visual
perception approach, dubbed MVP, that unifies these two sensor modalities for
large-scale, target-driven navigation tasks. Our MVP-based method can learn
faster, and is more accurate and robust to both extreme environmental changes
and poor GPS data than corresponding vision-only navigation methods. MVP
temporally incorporates compact image representations, obtained using VPR, with
optimized motion estimation data, including but not limited to those from VO or
optimized radar odometry (RO), to efficiently learn self-supervised navigation
policies via RL. We evaluate our method on two large real-world datasets,
Oxford Robotcar and Nordland Railway, over a range of weather (e.g. overcast,
night, snow, sun, rain, clouds) and seasonal (e.g. winter, spring, fall,
summer) conditions using the new CityLearn framework; an interactive
environment for efficiently training navigation agents. Our experimental
results, on traversals of the Oxford RobotCar dataset with no GPS data, show
that MVP can achieve 53% and 93% navigation success rate using VO and RO,
respectively, compared to 7% for a vision-only method. We additionally report a
trade-off between the RL success rate and the motion estimation precision.Comment: Under review at IROS 202