54 research outputs found

    VLocNet++: Deep Multitask Learning for Semantic Visual Localization and Odometry

    Full text link
    Semantic understanding and localization are fundamental enablers of robot autonomy that have for the most part been tackled as disjoint problems. While deep learning has enabled recent breakthroughs across a wide spectrum of scene understanding tasks, its applicability to state estimation tasks has been limited due to the direct formulation that renders it incapable of encoding scene-specific constrains. In this work, we propose the VLocNet++ architecture that employs a multitask learning approach to exploit the inter-task relationship between learning semantics, regressing 6-DoF global pose and odometry, for the mutual benefit of each of these tasks. Our network overcomes the aforementioned limitation by simultaneously embedding geometric and semantic knowledge of the world into the pose regression network. We propose a novel adaptive weighted fusion layer to aggregate motion-specific temporal information and to fuse semantic features into the localization stream based on region activations. Furthermore, we propose a self-supervised warping technique that uses the relative motion to warp intermediate network representations in the segmentation stream for learning consistent semantics. Finally, we introduce a first-of-a-kind urban outdoor localization dataset with pixel-level semantic labels and multiple loops for training deep networks. Extensive experiments on the challenging Microsoft 7-Scenes benchmark and our DeepLoc dataset demonstrate that our approach exceeds the state-of-the-art outperforming local feature-based methods while simultaneously performing multiple tasks and exhibiting substantial robustness in challenging scenarios.Comment: Demo and dataset available at http://deeploc.cs.uni-freiburg.d

    CMRNet++: Map and Camera Agnostic Monocular Visual Localization in LiDAR Maps

    Full text link
    Localization is a critically essential and crucial enabler of autonomous robots. While deep learning has made significant strides in many computer vision tasks, it is still yet to make a sizeable impact on improving capabilities of metric visual localization. One of the major hindrances has been the inability of existing Convolutional Neural Network (CNN)-based pose regression methods to generalize to previously unseen places. Our recently introduced CMRNet effectively addresses this limitation by enabling map independent monocular localization in LiDAR-maps. In this paper, we now take it a step further by introducing CMRNet++, which is a significantly more robust model that not only generalizes to new places effectively, but is also independent of the camera parameters. We enable this capability by combining deep learning with geometric techniques, and by moving the metric reasoning outside the learning process. In this way, the weights of the network are not tied to a specific camera. Extensive evaluations of CMRNet++ on three challenging autonomous driving datasets, i.e., KITTI, Argoverse, and Lyft5, show that CMRNet++ outperforms CMRNet as well as other baselines by a large margin. More importantly, for the first-time, we demonstrate the ability of a deep learning approach to accurately localize without any retraining or fine-tuning in a completely new environment and independent of the camera parameters.Comment: Spotlight talk at IEEE ICRA 2020 Workshop on Emerging Learning and Algorithmic Methods for Data Association in Robotic

    Vision-based Autonomous Landing in Catastrophe-Struck Environments

    Full text link
    Unmanned Aerial Vehicles (UAVs) equipped with bioradars are a life-saving technology that can enable identification of survivors under collapsed buildings in the aftermath of natural disasters such as earthquakes or gas explosions. However, these UAVs have to be able to autonomously land on debris piles in order to accurately locate the survivors. This problem is extremely challenging as the structure of these debris piles is often unknown and no prior knowledge can be leveraged. In this work, we propose a computationally efficient system that is able to reliably identify safe landing sites and autonomously perform the landing maneuver. Specifically, our algorithm computes costmaps based on several hazard factors including terrain flatness, steepness, depth accuracy and energy consumption information. We first estimate dense candidate landing sites from the resulting costmap and then employ clustering to group neighboring sites into a safe landing region. Finally, a minimum-jerk trajectory is computed for landing considering the surrounding obstacles and the UAV dynamics. We demonstrate the efficacy of our system using experiments from a city scale hyperrealistic simulation environment and in real-world scenarios with collapsed buildings

    Deep Auxiliary Learning for Visual Localization and Odometry

    Full text link
    Localization is an indispensable component of a robot's autonomy stack that enables it to determine where it is in the environment, essentially making it a precursor for any action execution or planning. Although convolutional neural networks have shown promising results for visual localization, they are still grossly outperformed by state-of-the-art local feature-based techniques. In this work, we propose VLocNet, a new convolutional neural network architecture for 6-DoF global pose regression and odometry estimation from consecutive monocular images. Our multitask model incorporates hard parameter sharing, thus being compact and enabling real-time inference, in addition to being end-to-end trainable. We propose a novel loss function that utilizes auxiliary learning to leverage relative pose information during training, thereby constraining the search space to obtain consistent pose estimates. We evaluate our proposed VLocNet on indoor as well as outdoor datasets and show that even our single task model exceeds the performance of state-of-the-art deep architectures for global localization, while achieving competitive performance for visual odometry estimation. Furthermore, we present extensive experimental evaluations utilizing our proposed Geometric Consistency Loss that show the effectiveness of multitask learning and demonstrate that our model is the first deep learning technique to be on par with, and in some cases outperforms state-of-the-art SIFT-based approaches.Comment: Accepted for ICRA 201

    Robust Vision Challenge 2020 -- 1st Place Report for Panoptic Segmentation

    Full text link
    In this technical report, we present key details of our winning panoptic segmentation architecture EffPS_b1bs4_RVC. Our network is a lightweight version of our state-of-the-art EfficientPS architecture that consists of our proposed shared backbone with a modified EfficientNet-B5 model as the encoder, followed by the 2-way FPN to learn semantically rich multi-scale features. It consists of two task-specific heads, a modified Mask R-CNN instance head and our novel semantic segmentation head that processes features of different scales with specialized modules for coherent feature refinement. Finally, our proposed panoptic fusion module adaptively fuses logits from each of the heads to yield the panoptic segmentation output. The Robust Vision Challenge 2020 benchmarking results show that our model is ranked #1 on Microsoft COCO, VIPER and WildDash, and is ranked #2 on Cityscapes and Mapillary Vistas, thereby achieving the overall rank #1 for the panoptic segmentation task

    Deep Spatiotemporal Models for Robust Proprioceptive Terrain Classification

    Full text link
    Terrain classification is a critical component of any autonomous mobile robot system operating in unknown real-world environments. Over the years, several proprioceptive terrain classification techniques have been introduced to increase robustness or act as a fallback for traditional vision based approaches. However, they lack widespread adaptation due to various factors that include inadequate accuracy, robustness and slow run-times. In this paper, we use vehicle-terrain interaction sounds as a proprioceptive modality and propose a deep Long-Short Term Memory (LSTM) based recurrent model that captures both the spatial and temporal dynamics of such a problem, thereby overcoming these past limitations. Our model consists of a new Convolution Neural Network (CNN) architecture that learns deep spatial features, complemented with LSTM units that learn complex temporal dynamics. Experiments on two extensive datasets collected with different microphones on various indoor and outdoor terrains demonstrate state-of-the-art performance compared to existing techniques. We additionally evaluate the performance in adverse acoustic conditions with high ambient noise and propose a noise-aware training scheme that enables learning of more generalizable models that are essential for robust real-world deployments

    Dynamic Object Removal and Spatio-Temporal RGB-D Inpainting via Geometry-Aware Adversarial Learning

    Full text link
    Dynamic objects have a significant impact on the robot's perception of the environment which degrades the performance of essential tasks such as localization and mapping. In this work, we address this problem by synthesizing plausible color, texture and geometry in regions occluded by dynamic objects. We propose the novel geometry-aware DynaFill architecture that follows a coarse-to-fine topology and incorporates our gated recurrent feedback mechanism to adaptively fuse information from previous timesteps. We optimize our architecture using adversarial training to synthesize fine realistic textures which enables it to hallucinate color and depth structure in occluded regions online in a spatially and temporally coherent manner, without relying on future frame information. Casting our inpainting problem as an image-to-image translation task, our model also corrects regions correlated with the presence of dynamic objects in the scene, such as shadows or reflections. We introduce a large-scale hyperrealistic dataset with RGB-D images, semantic segmentation labels, camera poses as well as groundtruth RGB-D information of occluded regions. Extensive quantitative and qualitative evaluations show that our approach achieves state-of-the-art performance, even in challenging weather conditions. Furthermore, we present results for retrieval-based visual localization with the synthesized images that demonstrate the utility of our approach.Comment: Dataset, code and models are available at http://rl.uni-freiburg.de/research/rgbd-inpaintin

    Multimodal Interaction-aware Motion Prediction for Autonomous Street Crossing

    Full text link
    For mobile robots navigating on sidewalks, it is essential to be able to safely cross street intersections. Most existing approaches rely on the recognition of the traffic light signal to make an informed crossing decision. Although these approaches have been crucial enablers for urban navigation, the capabilities of robots employing such approaches are still limited to navigating only on streets containing signalized intersections. In this paper, we address this challenge and propose a multimodal convolutional neural network framework to predict the safety of a street intersection for crossing. Our architecture consists of two subnetworks; an interaction-aware trajectory estimation stream IA-TCNN, that predicts the future states of all observed traffic participants in the scene, and a traffic light recognition stream AtteNet. Our IA-TCNN utilizes dilated causal convolutions to model the behavior of the observable dynamic agents in the scene without explicitly assigning priorities to the interactions among them. While AtteNet utilizes Squeeze-Excitation blocks to learn a content-aware mechanism for selecting the relevant features from the data, thereby improving the noise robustness. Learned representations from the traffic light recognition stream are fused with the estimated trajectories from the motion prediction stream to learn the crossing decision. Furthermore, we extend our previously introduced Freiburg Street Crossing dataset with sequences captured at different types of intersections, demonstrating complex interactions among the traffic participants. Extensive experimental evaluations on public benchmark datasets and our proposed dataset demonstrate that our network achieves state-of-the-art performance for each of the subtasks, as well as for the crossing safety prediction.Comment: The International Journal of Robotics Research (2020

    Learning Kinematic Feasibility for Mobile Manipulation through Deep Reinforcement Learning

    Full text link
    Mobile manipulation tasks remain one of the critical challenges for the widespread adoption of autonomous robots in both service and industrial scenarios. While planning approaches are good at generating feasible whole-body robot trajectories, they struggle with dynamic environments as well as the incorporation of constraints given by the task and the environment. On the other hand, dynamic motion models in the action space struggle with generating kinematically feasible trajectories for mobile manipulation actions. We propose a deep reinforcement learning approach to learn feasible dynamic motions for a mobile base while the end-effector follows a trajectory in task space generated by an arbitrary system to fulfill the task at hand. This modular formulation has several benefits: it enables us to readily transform a broad range of end-effector motions into mobile applications, it allows us to use the kinematic feasibility of the end-effector trajectory as a dense reward signal and its modular formulation allows it to generalise to unseen end-effector motions at test time. We demonstrate the capabilities of our approach on multiple mobile robot platforms with different kinematic abilities and different types of wheeled platforms in extensive simulated as well as real-world experiments.Comment: Accepted for publication in RA-L. Code and Models: http://rl.uni-freiburg.de/research/kinematic-feasibility-r

    LCDNet: Deep Loop Closure Detection and Point Cloud Registration for LiDAR SLAM

    Full text link
    Loop closure detection is an essential component of Simultaneous Localization and Mapping (SLAM) systems, which reduces the drift accumulated over time. Over the years, several deep learning approaches have been proposed to address this task, however their performance has been subpar compared to handcrafted techniques, especially while dealing with reverse loops. In this paper, we introduce the novel LCDNet that effectively detects loop closures in LiDAR point clouds by simultaneously identifying previously visited places and estimating the 6-DoF relative transformation between the current scan and the map. LCDNet is composed of a shared encoder, a place recognition head that extracts global descriptors, and a relative pose head that estimates the transformation between two point clouds. We introduce a novel relative pose head based on the unbalanced optimal transport theory that we implement in a differentiable manner to allow for end-to-end training. Extensive evaluations of LCDNet on multiple real-world autonomous driving datasets show that our approach outperforms state-of-the-art loop closure detection and point cloud registration techniques by a large margin, especially while dealing with reverse loops. Moreover, we integrate our proposed loop closure detection approach into a LiDAR SLAM library to provide a complete mapping system and demonstrate the generalization ability using different sensor setup in an unseen city
    • …
    corecore