160 research outputs found
Visual Place Recognition for Autonomous Robots
Autonomous robotics has been the subject of great interest within the research community over the past few decades. Its applications are wide-spread, ranging from health-care to manufacturing, goods transportation to home deliveries, site-maintenance to construction, planetary explorations to rescue operations and many others, including but not limited to agriculture, defence, commerce, leisure and extreme environments. At the core of robot autonomy lies the problem of localisation, i.e, knowing where it is and within the robotics community, this problem is termed as place recognition. Place recognition using only visual input is termed as Visual Place Recognition (VPR) and refers to the ability of an autonomous system to recall a previously visited place using only visual input, under changing viewpoint, illumination and seasonal conditions, and given computational and storage constraints.
This thesis is a collection of 4 inter-linked, mutually-relevant but branching-out topics within VPR: 1) What makes a place/image worthy for VPR?, 2) How to define a state-of-the-art in VPR?, 3) Do VPR techniques designed for ground-based platforms extend to aerial platforms? and 4) Can a handcrafted VPR technique outperform deep-learning-based VPR techniques? Each of these questions is a dedicated, peer-reviewed chapter in this thesis and the author attempts to answer these questions to the best of his abilities.
The worthiness of a place essentially refers to the salience and distinctiveness of the content in the image of this place. This salience is modelled as a framework, namely memorable-maps, comprising of 3 conjoint criteria: a) Human-memorability of an image, 2) Staticity and 3) Information content. Because a large number of VPR techniques have been proposed over the past 10-15 years, and due to the variation of employed VPR datasets and metrics for evaluation, the correct state-of-the-art remains ambiguous. The author levels this playing field by deploying 10 contemporary techniques on a common platform and use the most challenging VPR datasets to provide a holistic performance comparison. This platform is then extended to aerial place recognition datasets to answer the 3rd question above. Finally, the author designs a novel, handcrafted, compute-efficient and training-free VPR technique that outperforms state-of-the-art VPR techniques on 5 different VPR datasets
Keyframe-based monocular SLAM: design, survey, and future directions
Extensive research in the field of monocular SLAM for the past fifteen years
has yielded workable systems that found their way into various applications in
robotics and augmented reality. Although filter-based monocular SLAM systems
were common at some time, the more efficient keyframe-based solutions are
becoming the de facto methodology for building a monocular SLAM system. The
objective of this paper is threefold: first, the paper serves as a guideline
for people seeking to design their own monocular SLAM according to specific
environmental constraints. Second, it presents a survey that covers the various
keyframe-based monocular SLAM systems in the literature, detailing the
components of their implementation, and critically assessing the specific
strategies made in each proposed solution. Third, the paper provides insight
into the direction of future research in this field, to address the major
limitations still facing monocular SLAM; namely, in the issues of illumination
changes, initialization, highly dynamic motion, poorly textured scenes,
repetitive textures, map maintenance, and failure recovery
VPR-Bench: An Open-Source Visual Place Recognition Evaluation Framework with Quantifiable Viewpoint and Appearance Change
Visual place recognition (VPR) is the process of recognising a previously visited place using visual information, often under varying appearance conditions and viewpoint changes and with computational constraints. VPR is related to the concepts of localisation, loop closure, image retrieval and is a critical component of many autonomous navigation systems ranging from autonomous vehicles to drones and computer vision systems. While the concept of place recognition has been around for many years, VPR research has grown rapidly as a field over the past decade due to improving camera hardware and its potential for deep learning-based techniques, and has become a widely studied topic in both the computer vision and robotics communities. This growth however has led to fragmentation and a lack of standardisation in the field, especially concerning performance evaluation. Moreover, the notion of viewpoint and illumination invariance of VPR techniques has largely been assessed qualitatively and hence ambiguously in the past. In this paper, we address these gaps through a new comprehensive open-source framework for assessing the performance of VPR techniques, dubbed “VPR-Bench”. VPR-Bench (Open-sourced at: https://github.com/MubarizZaffar/VPR-Bench) introduces two much-needed capabilities for VPR researchers: firstly, it contains a benchmark of 12 fully-integrated datasets and 10 VPR techniques, and secondly, it integrates a comprehensive variation-quantified dataset for quantifying viewpoint and illumination invariance. We apply and analyse popular evaluation metrics for VPR from both the computer vision and robotics communities, and discuss how these different metrics complement and/or replace each other, depending upon the underlying applications and system requirements. Our analysis reveals that no universal SOTA VPR technique exists, since: (a) state-of-the-art (SOTA) performance is achieved by 8 out of the 10 techniques on at least one dataset, (b) SOTA technique in one community does not necessarily yield SOTA performance in the other given the differences in datasets and metrics. Furthermore, we identify key open challenges since: (c) all 10 techniques suffer greatly in perceptually-aliased and less-structured environments, (d) all techniques suffer from viewpoint variance where lateral change has less effect than 3D change, and (e) directional illumination change has more adverse effects on matching confidence than uniform illumination change. We also present detailed meta-analyses regarding the roles of varying ground-truths, platforms, application requirements and technique parameters. Finally, VPR-Bench provides a unified implementation to deploy these VPR techniques, metrics and datasets, and is extensible through templates
Pix2Map: Cross-modal Retrieval for Inferring Street Maps from Images
Self-driving vehicles rely on urban street maps for autonomous navigation. In
this paper, we introduce Pix2Map, a method for inferring urban street map
topology directly from ego-view images, as needed to continually update and
expand existing maps. This is a challenging task, as we need to infer a complex
urban road topology directly from raw image data. The main insight of this
paper is that this problem can be posed as cross-modal retrieval by learning a
joint, cross-modal embedding space for images and existing maps, represented as
discrete graphs that encode the topological layout of the visual surroundings.
We conduct our experimental evaluation using the Argoverse dataset and show
that it is indeed possible to accurately retrieve street maps corresponding to
both seen and unseen roads solely from image data. Moreover, we show that our
retrieved maps can be used to update or expand existing maps and even show
proof-of-concept results for visual localization and image retrieval from
spatial graphs.Comment: 12 pages, 8 figure
ViNG: Learning Open-World Navigation with Visual Goals
We propose a learning-based navigation system for reaching visually indicated
goals and demonstrate this system on a real mobile robot platform. Learning
provides an appealing alternative to conventional methods for robotic
navigation: instead of reasoning about environments in terms of geometry and
maps, learning can enable a robot to learn about navigational affordances,
understand what types of obstacles are traversable (e.g., tall grass) or not
(e.g., walls), and generalize over patterns in the environment. However, unlike
conventional planning algorithms, it is harder to change the goal for a learned
policy during deployment. We propose a method for learning to navigate towards
a goal image of the desired destination. By combining a learned policy with a
topological graph constructed out of previously observed data, our system can
determine how to reach this visually indicated goal even in the presence of
variable appearance and lighting. Three key insights, waypoint proposal, graph
pruning and negative mining, enable our method to learn to navigate in
real-world environments using only offline data, a setting where prior methods
struggle. We instantiate our method on a real outdoor ground robot and show
that our system, which we call ViNG, outperforms previously-proposed methods
for goal-conditioned reinforcement learning, including other methods that
incorporate reinforcement learning and search. We also study how \sysName
generalizes to unseen environments and evaluate its ability to adapt to such an
environment with growing experience. Finally, we demonstrate ViNG on a number
of real-world applications, such as last-mile delivery and warehouse
inspection. We encourage the reader to visit the project website for videos of
our experiments and demonstrations sites.google.com/view/ving-robot.Comment: Presented at International Conference on Robotics and Automation
(ICRA) 202
An Object SLAM Framework for Association, Mapping, and High-Level Tasks
Object SLAM is considered increasingly significant for robot high-level
perception and decision-making. Existing studies fall short in terms of data
association, object representation, and semantic mapping and frequently rely on
additional assumptions, limiting their performance. In this paper, we present a
comprehensive object SLAM framework that focuses on object-based perception and
object-oriented robot tasks. First, we propose an ensemble data association
approach for associating objects in complicated conditions by incorporating
parametric and nonparametric statistic testing. In addition, we suggest an
outlier-robust centroid and scale estimation algorithm for modeling objects
based on the iForest and line alignment. Then a lightweight and object-oriented
map is represented by estimated general object models. Taking into
consideration the semantic invariance of objects, we convert the object map to
a topological map to provide semantic descriptors to enable multi-map matching.
Finally, we suggest an object-driven active exploration strategy to achieve
autonomous mapping in the grasping scenario. A range of public datasets and
real-world results in mapping, augmented reality, scene matching,
relocalization, and robotic manipulation have been used to evaluate the
proposed object SLAM framework for its efficient performance.Comment: Accepted by IEEE Transactions on Robotics(T-RO
- …