20,493 research outputs found

    Towards vision based navigation in large indoor environments

    Full text link
    The main contribution of this paper is a novel stereo-based algorithm which serves as a tool to examine the viability of stereo vision solutions to the simultaneous localisation and mapping (SLAM) for large indoor environments. Using features extracted from the scale invariant feature transform (SIFT) and depth maps from a small vision system (SVS) stereo head, an extended Kalman fllter (EKF) based SLAM algorithm, that allows the independent use of information relating to depth and bearing, is developed. By means of a map pruning strategy for managing the computational cost, it is demonstrated that statistically consistent location estimates can be generated for a small (6 m × 6 m) structured office environment, and in a robotics search and rescue arena of similar size. It is shown that in a larger office environment, the proposed algorithm generates location estimates which are topologically correct, but statistically inconsistent. A discussion on the possible reasons for the inconsistency is presented. The paper highlights that, despite recent advances, building accurate geometric maps of large environments with vision only sensing is still a challenging task. ©2006 IEEE

    Vision-and-Language Navigation: Interpreting visually-grounded navigation instructions in real environments

    Full text link
    A robot that can carry out a natural-language instruction has been a dream since before the Jetsons cartoon series imagined a life of leisure mediated by a fleet of attentive robot helpers. It is a dream that remains stubbornly distant. However, recent advances in vision and language methods have made incredible progress in closely related areas. This is significant because a robot interpreting a natural-language navigation instruction on the basis of what it sees is carrying out a vision and language process that is similar to Visual Question Answering. Both tasks can be interpreted as visually grounded sequence-to-sequence translation problems, and many of the same methods are applicable. To enable and encourage the application of vision and language methods to the problem of interpreting visually-grounded navigation instructions, we present the Matterport3D Simulator -- a large-scale reinforcement learning environment based on real imagery. Using this simulator, which can in future support a range of embodied vision and language tasks, we provide the first benchmark dataset for visually-grounded natural language navigation in real buildings -- the Room-to-Room (R2R) dataset.Comment: CVPR 2018 Spotlight presentatio

    Learning to Fly by Crashing

    Full text link
    How do you learn to navigate an Unmanned Aerial Vehicle (UAV) and avoid obstacles? One approach is to use a small dataset collected by human experts: however, high capacity learning algorithms tend to overfit when trained with little data. An alternative is to use simulation. But the gap between simulation and real world remains large especially for perception problems. The reason most research avoids using large-scale real data is the fear of crashes! In this paper, we propose to bite the bullet and collect a dataset of crashes itself! We build a drone whose sole purpose is to crash into objects: it samples naive trajectories and crashes into random objects. We crash our drone 11,500 times to create one of the biggest UAV crash dataset. This dataset captures the different ways in which a UAV can crash. We use all this negative flying data in conjunction with positive data sampled from the same trajectories to learn a simple yet powerful policy for UAV navigation. We show that this simple self-supervised model is quite effective in navigating the UAV even in extremely cluttered environments with dynamic obstacles including humans. For supplementary video see: https://youtu.be/u151hJaGKU

    Sparse 3D Point-cloud Map Upsampling and Noise Removal as a vSLAM Post-processing Step: Experimental Evaluation

    Full text link
    The monocular vision-based simultaneous localization and mapping (vSLAM) is one of the most challenging problem in mobile robotics and computer vision. In this work we study the post-processing techniques applied to sparse 3D point-cloud maps, obtained by feature-based vSLAM algorithms. Map post-processing is split into 2 major steps: 1) noise and outlier removal and 2) upsampling. We evaluate different combinations of known algorithms for outlier removing and upsampling on datasets of real indoor and outdoor environments and identify the most promising combination. We further use it to convert a point-cloud map, obtained by the real UAV performing indoor flight to 3D voxel grid (octo-map) potentially suitable for path planning.Comment: 10 pages, 4 figures, camera-ready version of paper for "The 3rd International Conference on Interactive Collaborative Robotics (ICR 2018)
    corecore