7 research outputs found
PADLoC: LiDAR-Based Deep Loop Closure Detection and Registration using Panoptic Attention
A key component of graph-based SLAM systems is the ability to detect loop
closures in a trajectory to reduce the drift accumulated over time from the
odometry. Most LiDAR-based methods achieve this goal by using only the
geometric information, disregarding the semantics of the scene. In this work,
we introduce PADLoC, a LiDAR-based loop closure detection and registration
architecture comprising a shared 3D convolutional feature extraction backbone,
a global descriptor head for loop closure detection, and a novel
transformer-based head for point cloud matching and registration. We present
multiple methods for estimating the point-wise matching confidence based on
diversity indices. Additionally, to improve forward-backward consistency, we
propose the use of two shared matching and registration heads with their source
and target inputs swapped by exploiting that the estimated relative
transformations must be inverse of each other. Furthermore, we leverage
panoptic information during training in the form of a novel loss function that
reframes the matching problem as a classification task in the case of the
semantic labels and as a graph connectivity assignment for the instance labels.
We perform extensive evaluations of PADLoC on multiple real-world datasets
demonstrating that it achieves state-of-the-art performance. The code of our
work is publicly available at http://padloc.cs.uni-freiburg.de
Few-Shot Panoptic Segmentation With Foundation Models
Current state-of-the-art methods for panoptic segmentation require an immense
amount of annotated training data that is both arduous and expensive to obtain
posing a significant challenge for their widespread adoption. Concurrently,
recent breakthroughs in visual representation learning have sparked a paradigm
shift leading to the advent of large foundation models that can be trained with
completely unlabeled images. In this work, we propose to leverage such
task-agnostic image features to enable few-shot panoptic segmentation by
presenting Segmenting Panoptic Information with Nearly 0 labels (SPINO). In
detail, our method combines a DINOv2 backbone with lightweight network heads
for semantic segmentation and boundary estimation. We show that our approach,
albeit being trained with only ten annotated images, predicts high-quality
pseudo-labels that can be used with any existing panoptic segmentation method.
Notably, we demonstrate that SPINO achieves competitive results compared to
fully supervised baselines while using less than 0.3% of the ground truth
labels, paving the way for learning complex visual recognition tasks leveraging
foundation models. To illustrate its general applicability, we further deploy
SPINO on real-world robotic vision systems for both outdoor and indoor
environments. To foster future research, we make the code and trained models
publicly available at http://spino.cs.uni-freiburg.de
Collaborative Dynamic 3D Scene Graphs for Automated Driving
Maps have played an indispensable role in enabling safe and automated
driving. Although there have been many advances on different fronts ranging
from SLAM to semantics, building an actionable hierarchical semantic
representation of urban dynamic scenes from multiple agents is still a
challenging problem. In this work, we present Collaborative URBan Scene Graphs
(CURB-SG) that enable higher-order reasoning and efficient querying for many
functions of automated driving. CURB-SG leverages panoptic LiDAR data from
multiple agents to build large-scale maps using an effective graph-based
collaborative SLAM approach that detects inter-agent loop closures. To
semantically decompose the obtained 3D map, we build a lane graph from the
paths of ego agents and their panoptic observations of other vehicles. Based on
the connectivity of the lane graph, we segregate the environment into
intersecting and non-intersecting road areas. Subsequently, we construct a
multi-layered scene graph that includes lane information, the position of
static landmarks and their assignment to certain map sections, other vehicles
observed by the ego agents, and the pose graph from SLAM including 3D panoptic
point clouds. We extensively evaluate CURB-SG in urban scenarios using a
photorealistic simulator. We release our code at
http://curb.cs.uni-freiburg.de.Comment: Refined manuscript and extended supplementar
Continual SLAM: Beyond Lifelong Simultaneous Localization and Mapping through Continual Learning
Robots operating in the open world encounter various different environments
that can substantially differ from each other. This domain gap also poses a
challenge for Simultaneous Localization and Mapping (SLAM) being one of the
fundamental tasks for navigation. In particular, learning-based SLAM methods
are known to generalize poorly to unseen environments hindering their general
adoption. In this work, we introduce the novel task of continual SLAM extending
the concept of lifelong SLAM from a single dynamically changing environment to
sequential deployments in several drastically differing environments. To
address this task, we propose CL-SLAM leveraging a dual-network architecture to
both adapt to new environments and retain knowledge with respect to previously
visited environments. We compare CL-SLAM to learning-based as well as classical
SLAM methods and show the advantages of leveraging online data. We extensively
evaluate CL-SLAM on three different datasets and demonstrate that it
outperforms several baselines inspired by existing continual learning-based
visual odometry methods. We make the code of our work publicly available at
http://continual-slam.cs.uni-freiburg.de