32 research outputs found

    Learning Matchable Image Transformations for Long-term Metric Visual Localization

    Full text link
    Long-term metric self-localization is an essential capability of autonomous mobile robots, but remains challenging for vision-based systems due to appearance changes caused by lighting, weather, or seasonal variations. While experience-based mapping has proven to be an effective technique for bridging the `appearance gap,' the number of experiences required for reliable metric localization over days or months can be very large, and methods for reducing the necessary number of experiences are needed for this approach to scale. Taking inspiration from color constancy theory, we learn a nonlinear RGB-to-grayscale mapping that explicitly maximizes the number of inlier feature matches for images captured under different lighting and weather conditions, and use it as a pre-processing step in a conventional single-experience localization pipeline to improve its robustness to appearance change. We train this mapping by approximating the target non-differentiable localization pipeline with a deep neural network, and find that incorporating a learned low-dimensional context feature can further improve cross-appearance feature matching. Using synthetic and real-world datasets, we demonstrate substantial improvements in localization performance across day-night cycles, enabling continuous metric localization over a 30-hour period using a single mapping experience, and allowing experience-based localization to scale to long deployments with dramatically reduced data requirements.Comment: In IEEE Robotics and Automation Letters (RA-L) and presented at the IEEE International Conference on Robotics and Automation (ICRA'20), Paris, France, May 31-June 4, 202

    How to Train a CAT: Learning Canonical Appearance Transformations for Direct Visual Localization Under Illumination Change

    Full text link
    Direct visual localization has recently enjoyed a resurgence in popularity with the increasing availability of cheap mobile computing power. The competitive accuracy and robustness of these algorithms compared to state-of-the-art feature-based methods, as well as their natural ability to yield dense maps, makes them an appealing choice for a variety of mobile robotics applications. However, direct methods remain brittle in the face of appearance change due to their underlying assumption of photometric consistency, which is commonly violated in practice. In this paper, we propose to mitigate this problem by training deep convolutional encoder-decoder models to transform images of a scene such that they correspond to a previously-seen canonical appearance. We validate our method in multiple environments and illumination conditions using high-fidelity synthetic RGB-D datasets, and integrate the trained models into a direct visual localization pipeline, yielding improvements in visual odometry (VO) accuracy through time-varying illumination conditions, as well as improved metric relocalization performance under illumination change, where conventional methods normally fail. We further provide a preliminary investigation of transfer learning from synthetic to real environments in a localization context. An open-source implementation of our method using PyTorch is available at https://github.com/utiasSTARS/cat-net.Comment: In IEEE Robotics and Automation Letters (RA-L) and presented at the IEEE International Conference on Robotics and Automation (ICRA'18), Brisbane, Australia, May 21-25, 201

    Mapping and Real-Time Navigation With Application to Small UAS Urgent Landing

    Full text link
    Small Unmanned Aircraft Systems (sUAS) operating in low-altitude airspace require flight near buildings and over people. Robust urgent landing capabilities including landing site selection are needed. However, conventional fixed-wing emergency landing sites such as open fields and empty roadways are rare in cities. This motivates our work to uniquely consider unoccupied flat rooftops as possible nearby landing sites. We propose novel methods to identify flat rooftop buildings, isolate their flat surfaces, and find touchdown points that maximize distance to obstacles. We model flat rooftop surfaces as polygons that capture their boundaries and possible obstructions on them. This thesis offers five specific contributions to support urgent rooftop landing. First, the Polylidar algorithm is developed which enables efficient non-convex polygon extraction with interior holes from 2D point sets. A key insight of this work is a novel boundary following method that contrasts computationally expensive geometric unions of triangles. Results from real-world and synthetic benchmarks show comparable accuracy and more than four times speedup compared to other state-of-the-art methods. Second, we extend polygon extraction from 2D to 3D data where polygons represent flat surfaces and interior holes representing obstacles. Our Polylidar3D algorithm transforms point clouds into a triangular mesh where dominant plane normals are identified and used to parallelize and regularize planar segmentation and polygon extraction. The result is a versatile and extremely fast algorithm for non-convex polygon extraction of 3D data. Third, we propose a framework for classifying roof shape (e.g., flat) within a city. We process satellite images, airborne LiDAR point clouds, and building outlines to generate both a satellite and depth image of each building. Convolutional neural networks are trained for each modality to extract high level features and sent to a random forest classifier for roof shape prediction. This research contributes the largest multi-city annotated dataset with over 4,500 rooftops used to train and test models. Our results show flat-like rooftops are identified with > 90% precision and recall. Fourth, we integrate Polylidar3D and our roof shape prediction model to extract flat rooftop surfaces from archived data sources. We uniquely identify optimal touchdown points for all landing sites. We model risk as an innovative combination of landing site and path risk metrics and conduct a multi-objective Pareto front analysis for sUAS urgent landing in cities. Our proposed emergency planning framework guarantees a risk-optimal landing site and flight plan is selected. Fifth, we verify a chosen rooftop landing site on real-time vertical approach with on-board LiDAR and camera sensors. Our method contributes an innovative fusion of semantic segmentation using neural networks with computational geometry that is robust to individual sensor and method failure. We construct a high-fidelity simulated city in the Unreal game engine with a statistically-accurate representation of rooftop obstacles. We show our method leads to greater than 4% improvement in accuracy for landing site identification compared to using LiDAR only. This work has broad impact for the safety of sUAS in cities as well as Urban Air Mobility (UAM). Our methods identify thousands of additional rooftop landing sites in cities which can provide safe landing zones in the event of emergencies. However, the maps we create are limited by the availability, accuracy, and resolution of archived data. Methods for quantifying data uncertainty or performing real-time map updates from a fleet of sUAS are left for future work.PHDRoboticsUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttp://deepblue.lib.umich.edu/bitstream/2027.42/170026/1/jdcasta_1.pd

    Visual Perception for Manipulation and Imitation in Humanoid Robots

    Get PDF
    This thesis deals with visual perception for manipulation and imitation in humanoid robots. In particular, real-time applicable methods for object recognition and pose estimation as well as for markerless human motion capture have been developed. As only sensor a small baseline stereo camera system (approx. human eye distance) was used. An extensive experimental evaluation has been performed on simulated as well as real image data from real-world scenarios using the humanoid robot ARMAR-III

    Efficient 3D Segmentation, Registration and Mapping for Mobile Robots

    Get PDF
    Sometimes simple is better! For certain situations and tasks, simple but robust methods can achieve the same or better results in the same or less time than related sophisticated approaches. In the context of robots operating in real-world environments, key challenges are perceiving objects of interest and obstacles as well as building maps of the environment and localizing therein. The goal of this thesis is to carefully analyze such problem formulations, to deduce valid assumptions and simplifications, and to develop simple solutions that are both robust and fast. All approaches make use of sensors capturing 3D information, such as consumer RGBD cameras. Comparative evaluations show the performance of the developed approaches. For identifying objects and regions of interest in manipulation tasks, a real-time object segmentation pipeline is proposed. It exploits several common assumptions of manipulation tasks such as objects being on horizontal support surfaces (and well separated). It achieves real-time performance by using particularly efficient approximations in the individual processing steps, subsampling the input data where possible, and processing only relevant subsets of the data. The resulting pipeline segments 3D input data with up to 30Hz. In order to obtain complete segmentations of the 3D input data, a second pipeline is proposed that approximates the sampled surface, smooths the underlying data, and segments the smoothed surface into coherent regions belonging to the same geometric primitive. It uses different primitive models and can reliably segment input data into planes, cylinders and spheres. A thorough comparative evaluation shows state-of-the-art performance while computing such segmentations in near real-time. The second part of the thesis addresses the registration of 3D input data, i.e., consistently aligning input captured from different view poses. Several methods are presented for different types of input data. For the particular application of mapping with micro aerial vehicles where the 3D input data is particularly sparse, a pipeline is proposed that uses the same approximate surface reconstruction to exploit the measurement topology and a surface-to-surface registration algorithm that robustly aligns the data. Optimization of the resulting graph of determined view poses then yields globally consistent 3D maps. For sequences of RGBD data this pipeline is extended to include additional subsampling steps and an initial alignment of the data in local windows in the pose graph. In both cases, comparative evaluations show a robust and fast alignment of the input data
    corecore