3,646 research outputs found
A Unified Framework for Mutual Improvement of SLAM and Semantic Segmentation
This paper presents a novel framework for simultaneously implementing
localization and segmentation, which are two of the most important vision-based
tasks for robotics. While the goals and techniques used for them were
considered to be different previously, we show that by making use of the
intermediate results of the two modules, their performance can be enhanced at
the same time. Our framework is able to handle both the instantaneous motion
and long-term changes of instances in localization with the help of the
segmentation result, which also benefits from the refined 3D pose information.
We conduct experiments on various datasets, and prove that our framework works
effectively on improving the precision and robustness of the two tasks and
outperforms existing localization and segmentation algorithms.Comment: 7 pages, 5 figures.This work has been accepted by ICRA 2019. The demo
video can be found at https://youtu.be/Bkt53dAehj
Keyframe-based monocular SLAM: design, survey, and future directions
Extensive research in the field of monocular SLAM for the past fifteen years
has yielded workable systems that found their way into various applications in
robotics and augmented reality. Although filter-based monocular SLAM systems
were common at some time, the more efficient keyframe-based solutions are
becoming the de facto methodology for building a monocular SLAM system. The
objective of this paper is threefold: first, the paper serves as a guideline
for people seeking to design their own monocular SLAM according to specific
environmental constraints. Second, it presents a survey that covers the various
keyframe-based monocular SLAM systems in the literature, detailing the
components of their implementation, and critically assessing the specific
strategies made in each proposed solution. Third, the paper provides insight
into the direction of future research in this field, to address the major
limitations still facing monocular SLAM; namely, in the issues of illumination
changes, initialization, highly dynamic motion, poorly textured scenes,
repetitive textures, map maintenance, and failure recovery
Past, Present, and Future of Simultaneous Localization And Mapping: Towards the Robust-Perception Age
Simultaneous Localization and Mapping (SLAM)consists in the concurrent
construction of a model of the environment (the map), and the estimation of the
state of the robot moving within it. The SLAM community has made astonishing
progress over the last 30 years, enabling large-scale real-world applications,
and witnessing a steady transition of this technology to industry. We survey
the current state of SLAM. We start by presenting what is now the de-facto
standard formulation for SLAM. We then review related work, covering a broad
set of topics including robustness and scalability in long-term mapping, metric
and semantic representations for mapping, theoretical performance guarantees,
active SLAM and exploration, and other new frontiers. This paper simultaneously
serves as a position paper and tutorial to those who are users of SLAM. By
looking at the published research with a critical eye, we delineate open
challenges and new research issues, that still deserve careful scientific
investigation. The paper also contains the authors' take on two questions that
often animate discussions during robotics conferences: Do robots need SLAM? and
Is SLAM solved
DynaQuadric: Dynamic Quadric SLAM for Quadric Initialization, Mapping, and Tracking
Dynamic SLAM is a key technology for autonomous driving and robotics, and accurate pose estimation of surrounding objects is important for semantic perception tasks. Current quadric SLAM methods are based on the assumption of a static environment and can only reconstruct static quadrics in the scene, which limits their applications in complex dynamic scenarios. In this paper, we propose a visual SLAM system that is capable of reconstructing dynamic objects as quadrics, with a unified framework for jointly optimizing pose estimation, multi-object tracking (MOT), and quadric parameters. We propose a robust object-centric quadric initialization algorithm for both static and moving objects, which decouples the prior estimation of the object pose from the quadric parameters. The object is initialized with a coarse sphere, and quadric parameters are further refined. We design a novel factor graph that tightly optimizes camera pose, object pose, map points and quadric parameters within the sliding window-based optimization. To the best of our knowledge, we are the first to propose a dynamic SLAM that combines quadric representations and MOT in a tightly coupled optimization. We perform qualitative and quantitative experiments on both simulated and real-world datasets, and demonstrate the robustness and accuracy in terms of camera localization, dynamic quadric initialization, mapping and tracking. Our system demonstrates the potential application of object perception with quadric representation in complex dynamic scenes
Multi-level Map Construction for Dynamic Scenes
In dynamic scenes, both localization and mapping in visual SLAM face
significant challenges. In recent years, numerous outstanding research works
have proposed effective solutions for the localization problem. However, there
has been a scarcity of excellent works focusing on constructing long-term
consistent maps in dynamic scenes, which severely hampers map applications. To
address this issue, we have designed a multi-level map construction system
tailored for dynamic scenes. In this system, we employ multi-object tracking
algorithms, DBSCAN clustering algorithm, and depth information to rectify the
results of object detection, accurately extract static point clouds, and
construct dense point cloud maps and octree maps. We propose a plane map
construction algorithm specialized for dynamic scenes, involving the
extraction, filtering, data association, and fusion optimization of planes in
dynamic environments, thus creating a plane map. Additionally, we introduce an
object map construction algorithm targeted at dynamic scenes, which includes
object parameterization, data association, and update optimization. Extensive
experiments on public datasets and real-world scenarios validate the accuracy
of the multi-level maps constructed in this study and the robustness of the
proposed algorithms. Furthermore, we demonstrate the practical application
prospects of our algorithms by utilizing the constructed object maps for
dynamic object tracking
Object-level dynamic SLAM
Visual Simultaneous Localisation and Mapping (SLAM) can estimate a camera's pose in an unknown environment and reconstruct an online map of it. Despite the advances in many real-time dense SLAM systems, most still assume a static environment, which is not a valid assumption in many real-world scenarios. This thesis aims to enable dense visual SLAM to run robustly in a dynamic environment, knowing where the sensor is in the environment, and, also importantly, what and where objects are in the surrounding environment for better scene understanding.
The contributions in this thesis are threefold. The first one presents one of the first object-level dynamic SLAM systems that robustly track camera pose while detecting, tracking, and reconstructing all the objects in dynamic scenes. It can continuously fuse geometric, semantic, and motion information for each object into an octree-based volumetric representation.
One of the challenges in tracking moving objects is that the object motion can easily break the illumination constancy assumption. In our second contribution, we address this issue by proposing a dense feature-metric alignment to robustly estimate camera and object poses. We will show how to learn dense feature maps and feature-metric uncertainties in a self-supervised way. They formulate a probabilistic feature-metric residual, which can be efficiently solved using Gauss-Newton optimisation and easily coupled with other residuals.
So far, we can only reconstruct objects' geometry from the sensor data. Our third contribution further incorporates category-level shape prior to the object mapping. Conditioning on the depth measurement, the learned implicit function completes the unseen part while reconstructing the observed part accurately. It can yield better reconstruction completeness and more accurate object pose estimation.
These three contributions in this thesis have advanced the state of the art in visual SLAM. We hope such object-level dynamic SLAM systems will help robots intelligently interact with the human-existing world.Open Acces
3DS-SLAM: A 3D Object Detection based Semantic SLAM towards Dynamic Indoor Environments
The existence of variable factors within the environment can cause a decline
in camera localization accuracy, as it violates the fundamental assumption of a
static environment in Simultaneous Localization and Mapping (SLAM) algorithms.
Recent semantic SLAM systems towards dynamic environments either rely solely on
2D semantic information, or solely on geometric information, or combine their
results in a loosely integrated manner. In this research paper, we introduce
3DS-SLAM, 3D Semantic SLAM, tailored for dynamic scenes with visual 3D object
detection. The 3DS-SLAM is a tightly-coupled algorithm resolving both semantic
and geometric constraints sequentially. We designed a 3D part-aware hybrid
transformer for point cloud-based object detection to identify dynamic objects.
Subsequently, we propose a dynamic feature filter based on HDBSCAN clustering
to extract objects with significant absolute depth differences. When compared
against ORB-SLAM2, 3DS-SLAM exhibits an average improvement of 98.01% across
the dynamic sequences of the TUM RGB-D dataset. Furthermore, it surpasses the
performance of the other four leading SLAM systems designed for dynamic
environments
Track, then Decide: Category-Agnostic Vision-based Multi-Object Tracking
The most common paradigm for vision-based multi-object tracking is
tracking-by-detection, due to the availability of reliable detectors for
several important object categories such as cars and pedestrians. However,
future mobile systems will need a capability to cope with rich human-made
environments, in which obtaining detectors for every possible object category
would be infeasible. In this paper, we propose a model-free multi-object
tracking approach that uses a category-agnostic image segmentation method to
track objects. We present an efficient segmentation mask-based tracker which
associates pixel-precise masks reported by the segmentation. Our approach can
utilize semantic information whenever it is available for classifying objects
at the track level, while retaining the capability to track generic unknown
objects in the absence of such information. We demonstrate experimentally that
our approach achieves performance comparable to state-of-the-art
tracking-by-detection methods for popular object categories such as cars and
pedestrians. Additionally, we show that the proposed method can discover and
robustly track a large variety of other objects.Comment: ICRA'18 submissio
- …