3 research outputs found
Self-Supervised Structure-from-Motion through Tightly-Coupled Depth and Egomotion Networks
Much recent literature has formulated structure-from-motion (SfM) as a
self-supervised learning problem where the goal is to jointly learn neural
network models of depth and egomotion through view synthesis. Herein, we
address the open problem of how to optimally couple the depth and egomotion
network components. Toward this end, we introduce several notions of coupling,
categorize existing approaches, and present a novel tightly-coupled approach
that leverages the interdependence of depth and egomotion at training and at
inference time. Our approach uses iterative view synthesis to recursively
update the egomotion network input, permitting contextual information to be
passed between the components without explicit weight sharing. Through
substantial experiments, we demonstrate that our approach promotes consistency
between the depth and egomotion predictions at test time, improves
generalization on new data, and leads to state-of-the-art accuracy on indoor
and outdoor depth and egomotion evaluation benchmarks.Comment: Submitted to NeurIPS 202
Deep Learning for Depth, Ego-Motion, Optical Flow Estimation, and Semantic Segmentation
Visual Simultaneous Localization and Mapping (SLAM) is crucial for robot perception. Visual odometry (VO) is one of the essential components for SLAM, which can estimate the depth map of scenes and the ego-motion of a camera in unknown environments. Most previous work in this area uses geometry-based approaches. Recently, deep learning methods have opened a new door for this area. At present, most research under deep learning frameworks focuses on improving the accuracy of estimation results and reducing the dependence of enormous labelled training data. This thesis presents the work for exploring the deep learning technologies to estimate different tasks, such as depth, ego-motion, optical flow, and semantic segmentation, under the VO framework. Firstly, a stacked generative adversarial network is proposed to estimate the depth and ego-motion. It consists of a stack of GAN layers, of which the lowest layer estimates the depth and ego-motion while the higher layers estimate the spatial features. It can also capture the temporal dynamics due to the use of a recurrent representation across the layers. Secondly, digging into the internal network structure design, a novel recurrent spatial-temporal network(RSTNet)is proposed to estimate depth and ego-motion and optical flow and dynamic objects. This network can extract and retain more spatial and temporal features. Thedynamicobjectsaredetectedbyusingopticalflowdifferencebetweenfullflow and rigid flow. Finally, a semantic segmentation network is proposed, producing semantic segmentation results together with depth and ego-motion estimation results. All of the proposed contributions are tested and evaluated on open public datasets. The comparisons with other methods are provided. The results show that our proposed networks outperform the state-of-the-art methods of depth, ego-motion, and dynamic objects estimations