21 research outputs found
Feature discovery and visualization of robot mission data using convolutional autoencoders and Bayesian nonparametric topic models
The gap between our ability to collect interesting data and our ability to
analyze these data is growing at an unprecedented rate. Recent algorithmic
attempts to fill this gap have employed unsupervised tools to discover
structure in data. Some of the most successful approaches have used
probabilistic models to uncover latent thematic structure in discrete data.
Despite the success of these models on textual data, they have not generalized
as well to image data, in part because of the spatial and temporal structure
that may exist in an image stream.
We introduce a novel unsupervised machine learning framework that
incorporates the ability of convolutional autoencoders to discover features
from images that directly encode spatial information, within a Bayesian
nonparametric topic model that discovers meaningful latent patterns within
discrete data. By using this hybrid framework, we overcome the fundamental
dependency of traditional topic models on rigidly hand-coded data
representations, while simultaneously encoding spatial dependency in our topics
without adding model complexity. We apply this model to the motivating
application of high-level scene understanding and mission summarization for
exploratory marine robots. Our experiments on a seafloor dataset collected by a
marine robot show that the proposed hybrid framework outperforms current
state-of-the-art approaches on the task of unsupervised seafloor terrain
characterization.Comment: 8 page
Don't Look Back: Robustifying Place Categorization for Viewpoint- and Condition-Invariant Place Recognition
When a human drives a car along a road for the first time, they later
recognize where they are on the return journey typically without needing to
look in their rear-view mirror or turn around to look back, despite significant
viewpoint and appearance change. Such navigation capabilities are typically
attributed to our semantic visual understanding of the environment [1] beyond
geometry to recognizing the types of places we are passing through such as
"passing a shop on the left" or "moving through a forested area". Humans are in
effect using place categorization [2] to perform specific place recognition
even when the viewpoint is 180 degrees reversed. Recent advances in deep neural
networks have enabled high-performance semantic understanding of visual places
and scenes, opening up the possibility of emulating what humans do. In this
work, we develop a novel methodology for using the semantics-aware higher-order
layers of deep neural networks for recognizing specific places from within a
reference database. To further improve the robustness to appearance change, we
develop a descriptor normalization scheme that builds on the success of
normalization schemes for pure appearance-based techniques such as SeqSLAM [3].
Using two different datasets - one road-based, one pedestrian-based, we
evaluate the performance of the system in performing place recognition on
reverse traversals of a route with a limited field of view camera and no
turn-back-and-look behaviours, and compare to existing state-of-the-art
techniques and vanilla off-the-shelf features. The results demonstrate
significant improvements over the existing state of the art, especially for
extreme perceptual challenges that involve both great viewpoint change and
environmental appearance change. We also provide experimental analyses of the
contributions of the various system components.Comment: 9 pages, 11 figures, ICRA 201
Learning Local Feature Descriptor with Motion Attribute for Vision-based Localization
In recent years, camera-based localization has been widely used for robotic
applications, and most proposed algorithms rely on local features extracted
from recorded images. For better performance, the features used for open-loop
localization are required to be short-term globally static, and the ones used
for re-localization or loop closure detection need to be long-term static.
Therefore, the motion attribute of a local feature point could be exploited to
improve localization performance, e.g., the feature points extracted from
moving persons or vehicles can be excluded from these systems due to their
unsteadiness. In this paper, we design a fully convolutional network (FCN),
named MD-Net, to perform motion attribute estimation and feature description
simultaneously. MD-Net has a shared backbone network to extract features from
the input image and two network branches to complete each sub-task. With
MD-Net, we can obtain the motion attribute while avoiding increasing much more
computation. Experimental results demonstrate that the proposed method can
learn distinct local feature descriptor along with motion attribute only using
an FCN, by outperforming competing methods by a wide margin. We also show that
the proposed algorithm can be integrated into a vision-based localization
algorithm to improve estimation accuracy significantly.Comment: This paper will be presented on IROS1
CAPRICORN: Communication Aware Place Recognition using Interpretable Constellations of Objects in Robot Networks
Using multiple robots for exploring and mapping environments can provide
improved robustness and performance, but it can be difficult to implement. In
particular, limited communication bandwidth is a considerable constraint when a
robot needs to determine if it has visited a location that was previously
explored by another robot, as it requires for robots to share descriptions of
places they have visited. One way to compress this description is to use
constellations, groups of 3D points that correspond to the estimate of a set of
relative object positions. Constellations maintain the same pattern from
different viewpoints and can be robust to illumination changes or dynamic
elements. We present a method to extract from these constellations compact
spatial and semantic descriptors of the objects in a scene. We use this
representation in a 2-step decentralized loop closure verification: first, we
distribute the compact semantic descriptors to determine which other robots
might have seen scenes with similar objects; then we query matching robots with
the full constellation to validate the match using geometric information. The
proposed method requires less memory, is more interpretable than global image
descriptors, and could be useful for other tasks and interactions with the
environment. We validate our system's performance on a TUM RGB-D SLAM sequence
and show its benefits in terms of bandwidth requirements.Comment: 8 pages, 6 figures, 1 table. 2020 IEEE International Conference on
Robotics and Automation (ICRA
Communication constrained cloud-based long-term visual localization in real time
Visual localization is one of the primary capabilities for mobile robots.
Long-term visual localization in real time is particularly challenging, in
which the robot is required to efficiently localize itself using visual data
where appearance may change significantly over time. In this paper, we propose
a cloud-based visual localization system targeting at long-term localization in
real time. On the robot, we employ two estimators to achieve accurate and
real-time performance. One is a sliding-window based visual inertial odometry,
which integrates constraints from consecutive observations and self-motion
measurements, as well as the constraints induced by localization on the cloud.
This estimator builds a local visual submap as the virtual observation which is
then sent to the cloud as new localization constraints. The other one is a
delayed state Extended Kalman Filter to fuse the pose of the robot localized
from the cloud, the local odometry and the high-frequency inertial
measurements. On the cloud, we propose a longer sliding-window based
localization method to aggregate the virtual observations for larger field of
view, leading to more robust alignment between virtual observations and the
map. Under this architecture, the robot can achieve drift-free and real-time
localization using onboard resources even in a network with limited bandwidth,
high latency and existence of package loss, which enables the autonomous
navigation in real-world environment. We evaluate the effectiveness of our
system on a dataset with challenging seasonal and illuminative variations. We
further validate the robustness of the system under challenging network
conditions
A Hierarchical Dual Model of Environment- and Place-Specific Utility for Visual Place Recognition
Visual Place Recognition (VPR) approaches have typically attempted to match
places by identifying visual cues, image regions or landmarks that have high
``utility'' in identifying a specific place. But this concept of utility is not
singular - rather it can take a range of forms. In this paper, we present a
novel approach to deduce two key types of utility for VPR: the utility of
visual cues `specific' to an environment, and to a particular place. We employ
contrastive learning principles to estimate both the environment- and
place-specific utility of Vector of Locally Aggregated Descriptors (VLAD)
clusters in an unsupervised manner, which is then used to guide local feature
matching through keypoint selection. By combining these two utility measures,
our approach achieves state-of-the-art performance on three challenging
benchmark datasets, while simultaneously reducing the required storage and
compute time. We provide further analysis demonstrating that unsupervised
cluster selection results in semantically meaningful results, that finer
grained categorization often has higher utility for VPR than high level
semantic categorization (e.g. building, road), and characterise how these two
utility measures vary across different places and environments. Source code is
made publicly available at https://github.com/Nik-V9/HEAPUtil.Comment: Accepted to IEEE Robotics and Automation Letters (RA-L) and IROS 202