20,493 research outputs found
Towards vision based navigation in large indoor environments
The main contribution of this paper is a novel stereo-based algorithm which serves as a tool to examine the viability of stereo vision solutions to the simultaneous localisation and mapping (SLAM) for large indoor environments. Using features extracted from the scale invariant feature transform (SIFT) and depth maps from a small vision system (SVS) stereo head, an extended Kalman fllter (EKF) based SLAM algorithm, that allows the independent use of information relating to depth and bearing, is developed. By means of a map pruning strategy for managing the computational cost, it is demonstrated that statistically consistent location estimates can be generated for a small (6 m × 6 m) structured office environment, and in a robotics search and rescue arena of similar size. It is shown that in a larger office environment, the proposed algorithm generates location estimates which are topologically correct, but statistically inconsistent. A discussion on the possible reasons for the inconsistency is presented. The paper highlights that, despite recent advances, building accurate geometric maps of large environments with vision only sensing is still a challenging task. ©2006 IEEE
Vision-and-Language Navigation: Interpreting visually-grounded navigation instructions in real environments
A robot that can carry out a natural-language instruction has been a dream
since before the Jetsons cartoon series imagined a life of leisure mediated by
a fleet of attentive robot helpers. It is a dream that remains stubbornly
distant. However, recent advances in vision and language methods have made
incredible progress in closely related areas. This is significant because a
robot interpreting a natural-language navigation instruction on the basis of
what it sees is carrying out a vision and language process that is similar to
Visual Question Answering. Both tasks can be interpreted as visually grounded
sequence-to-sequence translation problems, and many of the same methods are
applicable. To enable and encourage the application of vision and language
methods to the problem of interpreting visually-grounded navigation
instructions, we present the Matterport3D Simulator -- a large-scale
reinforcement learning environment based on real imagery. Using this simulator,
which can in future support a range of embodied vision and language tasks, we
provide the first benchmark dataset for visually-grounded natural language
navigation in real buildings -- the Room-to-Room (R2R) dataset.Comment: CVPR 2018 Spotlight presentatio
Learning to Fly by Crashing
How do you learn to navigate an Unmanned Aerial Vehicle (UAV) and avoid
obstacles? One approach is to use a small dataset collected by human experts:
however, high capacity learning algorithms tend to overfit when trained with
little data. An alternative is to use simulation. But the gap between
simulation and real world remains large especially for perception problems. The
reason most research avoids using large-scale real data is the fear of crashes!
In this paper, we propose to bite the bullet and collect a dataset of crashes
itself! We build a drone whose sole purpose is to crash into objects: it
samples naive trajectories and crashes into random objects. We crash our drone
11,500 times to create one of the biggest UAV crash dataset. This dataset
captures the different ways in which a UAV can crash. We use all this negative
flying data in conjunction with positive data sampled from the same
trajectories to learn a simple yet powerful policy for UAV navigation. We show
that this simple self-supervised model is quite effective in navigating the UAV
even in extremely cluttered environments with dynamic obstacles including
humans. For supplementary video see: https://youtu.be/u151hJaGKU
Sparse 3D Point-cloud Map Upsampling and Noise Removal as a vSLAM Post-processing Step: Experimental Evaluation
The monocular vision-based simultaneous localization and mapping (vSLAM) is
one of the most challenging problem in mobile robotics and computer vision. In
this work we study the post-processing techniques applied to sparse 3D
point-cloud maps, obtained by feature-based vSLAM algorithms. Map
post-processing is split into 2 major steps: 1) noise and outlier removal and
2) upsampling. We evaluate different combinations of known algorithms for
outlier removing and upsampling on datasets of real indoor and outdoor
environments and identify the most promising combination. We further use it to
convert a point-cloud map, obtained by the real UAV performing indoor flight to
3D voxel grid (octo-map) potentially suitable for path planning.Comment: 10 pages, 4 figures, camera-ready version of paper for "The 3rd
International Conference on Interactive Collaborative Robotics (ICR 2018)
- …