12 research outputs found
Incremental Visual-Inertial 3D Mesh Generation with Structural Regularities
Visual-Inertial Odometry (VIO) algorithms typically rely on a point cloud
representation of the scene that does not model the topology of the
environment. A 3D mesh instead offers a richer, yet lightweight, model.
Nevertheless, building a 3D mesh out of the sparse and noisy 3D landmarks
triangulated by a VIO algorithm often results in a mesh that does not fit the
real scene. In order to regularize the mesh, previous approaches decouple state
estimation from the 3D mesh regularization step, and either limit the 3D mesh
to the current frame or let the mesh grow indefinitely. We propose instead to
tightly couple mesh regularization and state estimation by detecting and
enforcing structural regularities in a novel factor-graph formulation. We also
propose to incrementally build the mesh by restricting its extent to the
time-horizon of the VIO optimization; the resulting 3D mesh covers a larger
portion of the scene than a per-frame approach while its memory usage and
computational complexity remain bounded. We show that our approach successfully
regularizes the mesh, while improving localization accuracy, when structural
regularities are present, and remains operational in scenes without
regularities.Comment: 7 pages, 5 figures, ICRA accepte
Ultimate SLAM? Combining Events, Images, and IMU for Robust Visual SLAM in HDR and High Speed Scenarios
Event cameras are bio-inspired vision sensors that output pixel-level
brightness changes instead of standard intensity frames. These cameras do not
suffer from motion blur and have a very high dynamic range, which enables them
to provide reliable visual information during high speed motions or in scenes
characterized by high dynamic range. However, event cameras output only little
information when the amount of motion is limited, such as in the case of almost
still motion. Conversely, standard cameras provide instant and rich information
about the environment most of the time (in low-speed and good lighting
scenarios), but they fail severely in case of fast motions, or difficult
lighting such as high dynamic range or low light scenes. In this paper, we
present the first state estimation pipeline that leverages the complementary
advantages of these two sensors by fusing in a tightly-coupled manner events,
standard frames, and inertial measurements. We show on the publicly available
Event Camera Dataset that our hybrid pipeline leads to an accuracy improvement
of 130% over event-only pipelines, and 85% over standard-frames-only
visual-inertial systems, while still being computationally tractable.
Furthermore, we use our pipeline to demonstrate - to the best of our knowledge
- the first autonomous quadrotor flight using an event camera for state
estimation, unlocking flight scenarios that were not reachable with traditional
visual-inertial odometry, such as low-light environments and high-dynamic range
scenes.Comment: 8 pages, 9 figures, 2 table
Kimera: from SLAM to Spatial Perception with 3D Dynamic Scene Graphs
Humans are able to form a complex mental model of the environment they move
in. This mental model captures geometric and semantic aspects of the scene,
describes the environment at multiple levels of abstractions (e.g., objects,
rooms, buildings), includes static and dynamic entities and their relations
(e.g., a person is in a room at a given time). In contrast, current robots'
internal representations still provide a partial and fragmented understanding
of the environment, either in the form of a sparse or dense set of geometric
primitives (e.g., points, lines, planes, voxels) or as a collection of objects.
This paper attempts to reduce the gap between robot and human perception by
introducing a novel representation, a 3D Dynamic Scene Graph(DSG), that
seamlessly captures metric and semantic aspects of a dynamic environment. A DSG
is a layered graph where nodes represent spatial concepts at different levels
of abstraction, and edges represent spatio-temporal relations among nodes. Our
second contribution is Kimera, the first fully automatic method to build a DSG
from visual-inertial data. Kimera includes state-of-the-art techniques for
visual-inertial SLAM, metric-semantic 3D reconstruction, object localization,
human pose and shape estimation, and scene parsing. Our third contribution is a
comprehensive evaluation of Kimera in real-life datasets and photo-realistic
simulations, including a newly released dataset, uHumans2, which simulates a
collection of crowded indoor and outdoor scenes. Our evaluation shows that
Kimera achieves state-of-the-art performance in visual-inertial SLAM, estimates
an accurate 3D metric-semantic mesh model in real-time, and builds a DSG of a
complex indoor environment with tens of objects and humans in minutes. Our
final contribution shows how to use a DSG for real-time hierarchical semantic
path-planning. The core modules in Kimera are open-source.Comment: 34 pages, 25 figures, 9 tables. arXiv admin note: text overlap with
arXiv:2002.0628
3D Spatial Perception with Real-Time Dense Metric-Semantic SLAM
3D Spatial Perception is the ability of an agent to perceive and understand the three-dimensional structure of its environment, including its position and orientation within that environment. This ability is essential for autonomous robots to navigate and interact with their surroundings, since it enables robots to perform a wide variety of tasks, such as obstacle avoidance, path planning, and object manipulation.
To provide robots with a detailed and accurate representation of the surrounding environment, this thesis first proposes the use of a map representation that is geometrically dense, photometrically accurate, and semantically annotated. We define these maps as metric-semantic maps, and provide algorithms to build such maps in real-time. Metric-semantic maps allow both humans and robots to have a shared understanding of the scene, while providing the robot with sufficient information to localize, plan shortest paths, and avoid obstacles along the way. We then present a novel 3D representation that abstracts a dense metric-semantic map into higher-level concepts – such as rooms, corridors, and buildings – and also encodes static objects and dynamic entities. We define such representations as 3D Dynamic Scene Graphs (DSGs), and provide as well algorithms to build 3D DSGs. Finally, we show how these approaches can be combined to form a Spatial Perception Engine capable of building both metric-semantic maps and 3D DSGs from visual and inertial data. We also demonstrate the effectiveness of 3D DSGs for fast semantic path-planning queries, which can be used to direct robots using natural language commands.
In addition to the algorithms presented in this thesis, we open-source our code and datasets for the research community to use and explore. We believe that the algorithms and resources provided in this thesis open up exciting new possibilities in the field of 3D spatial perception, and we hope to inspire further research in this area, with the ultimate goal of creating fully autonomous robots that are able to navigate and operate in complex environments.Ph.D
Densifying Sparse VIO: a Mesh-based approach using Structural Regularities
The ideal vision system for an autonomous robot would not only provide the robot’s position and orientation (localization), but also an accurate and complete model of the scene (mapping). While localization information allows for controlling the robot, a map of the scene allows for collision-free navigation; combined, a robot can achieve full autonomy.
Visual Inertial Odometry (VIO) algorithms have shown impressive localization results in recent years. Unfortunately, typical VIO algorithms use a point cloud to represent the scene, which is hardly usable for other tasks such as obstacle avoidance or path planning.
In this work, we explore the possibility of generating a dense and consistent model of the scene by using a 3D mesh, while making use of structural regularities to improve both mesh and pose estimates. Our experimental results show that we can achieve a 26% more accurate pose estimates than state-of-the-art VIO algorithms when enforcing structural constraints, while also building a 3D mesh which provides a denser and more accurate map of the scene than a classical point cloud. We also show that our approach does not rely on assumptions about the scene and is general enough to work when structural regularities are not present. --> The ideal vision system for an autonomous robot would not only provide the robot’s position and orientation (localization), but also an accurate and complete model of the scene (mapping). While localization information allows for controlling the robot, a map of the scene allows for collision-free navigation; combined, a robot can achieve full autonomy.
Visual Inertial Odometry (VIO) algorithms have shown impressive localization results in recent years. Unfortunately, typical VIO algorithms use a point cloud to represent the scene, which is hardly usable for other tasks such as obstacle avoidance or path planning.
In this work, we explore the possibility of generating a dense and consistent model of the scene by using a 3D mesh, while making use of structural regularities to improve both mesh and pose estimates. Our experimental results show that we can achieve a 26\% more accurate pose estimates than state-of-the-art VIO algorithms when enforcing structural constraints, while also building a 3D mesh which provides a denser and more accurate map of the scene than a classical point cloud. We also show that our approach does not rely on assumptions about the scene and is general enough to work when structural regularities are not present
Incremental Visual-Inertial 3D Mesh Generation with Structural Regularities
© 2019 IEEE. Visual-Inertial Odometry (VIO) algorithms typically rely on a point cloud representation of the scene that does not model the topology of the environment. A 3D mesh instead offers a richer, yet lightweight, model. Nevertheless, building a 3D mesh out of the sparse and noisy 3D landmarks triangulated by a VIO algorithm often results in a mesh that does not fit the real scene. In order to regularize the mesh, previous approaches decouple state estimation from the 3D mesh regularization step, and either limit the 3D mesh to the current frame [1], [2] or let the mesh grow indefinitely [3], [4]. We propose instead to tightly couple mesh regularization and state estimation by detecting and enforcing structural regularities in a novel factor-graph formulation. We also propose to incrementally build the mesh by restricting its extent to the time-horizon of the VIO optimization; the resulting 3D mesh covers a larger portion of the scene than a per-frame approach while its memory usage and computational complexity remain bounded. We show that our approach successfully regularizes the mesh, while improving localization accuracy, when structural regularities are present, and remains operational in scenes without regularities.ARL (Award W911NF-17-2-0181
Incremental visual-inertial 3d mesh generation with structural regularities
Visual-Inertial Odometry (VIO) algorithms typically rely on a point cloud representation of the scene that does not model the topology of the environment. A 3D mesh instead offers a richer, yet lightweight, model. Nevertheless, building a 3D mesh out of the sparse and noisy 3D landmarks triangulated by a VIO algorithm often results in a mesh that does not fit the real scene. In order to regularize the mesh, previous approaches decouple state estimation from the 3D mesh regularization step, and either limit the 3D mesh to the current frame [1], [2] or let the mesh grow indefinitely [3], [4]. We propose instead to tightly couple mesh regularization and state estimation by detecting and enforcing structural regularities in a novel factor-graph formulation. We also propose to incrementally build the mesh by restricting its extent to the time-horizon of the VIO optimization; the resulting 3D mesh covers a larger portion of the scene than a per-frame approach while its memory usage and computational complexity remain bounded. We show that our approach successfully regularizes the mesh, while improving localization accuracy, when structural regularities are present, and remains operational in scenes without regularities
Kimera: An Open-Source Library for Real-Time Metric-Semantic Localization and Mapping
© 2020 IEEE. We provide an open-source C++ library for real-time metric-semantic visual-inertial Simultaneous Localization And Mapping (SLAM). The library goes beyond existing visual and visual-inertial SLAM libraries (e.g., ORB-SLAM, VINS-Mono, OKVIS, ROVIO) by enabling mesh reconstruction and semantic labeling in 3D. Kimera is designed with modularity in mind and has four key components: a visual-inertial odometry (VIO) module for fast and accurate state estimation, a robust pose graph optimizer for global trajectory estimation, a lightweight 3D mesher module for fast mesh reconstruction, and a dense 3D metric-semantic reconstruction module. The modules can be run in isolation or in combination, hence Kimera can easily fall back to a state-of-the-art VIO or a full SLAM system. Kimera runs in real-time on a CPU and produces a 3D metric-semantic mesh from semantically labeled images, which can be obtained by modern deep learning methods. We hope that the flexibility, computational efficiency, robustness, and accuracy afforded by Kimera will build a solid basis for future metric-semantic SLAM and perception research, and will allow researchers across multiple areas (e.g., VIO, SLAM, 3D reconstruction, segmentation) to benchmark and prototype their own efforts without having to start from scratch.ARL (Award W911NF-17-2-0181