12 research outputs found

    Incremental Visual-Inertial 3D Mesh Generation with Structural Regularities

    Full text link
    Visual-Inertial Odometry (VIO) algorithms typically rely on a point cloud representation of the scene that does not model the topology of the environment. A 3D mesh instead offers a richer, yet lightweight, model. Nevertheless, building a 3D mesh out of the sparse and noisy 3D landmarks triangulated by a VIO algorithm often results in a mesh that does not fit the real scene. In order to regularize the mesh, previous approaches decouple state estimation from the 3D mesh regularization step, and either limit the 3D mesh to the current frame or let the mesh grow indefinitely. We propose instead to tightly couple mesh regularization and state estimation by detecting and enforcing structural regularities in a novel factor-graph formulation. We also propose to incrementally build the mesh by restricting its extent to the time-horizon of the VIO optimization; the resulting 3D mesh covers a larger portion of the scene than a per-frame approach while its memory usage and computational complexity remain bounded. We show that our approach successfully regularizes the mesh, while improving localization accuracy, when structural regularities are present, and remains operational in scenes without regularities.Comment: 7 pages, 5 figures, ICRA accepte

    Ultimate SLAM? Combining Events, Images, and IMU for Robust Visual SLAM in HDR and High Speed Scenarios

    Full text link
    Event cameras are bio-inspired vision sensors that output pixel-level brightness changes instead of standard intensity frames. These cameras do not suffer from motion blur and have a very high dynamic range, which enables them to provide reliable visual information during high speed motions or in scenes characterized by high dynamic range. However, event cameras output only little information when the amount of motion is limited, such as in the case of almost still motion. Conversely, standard cameras provide instant and rich information about the environment most of the time (in low-speed and good lighting scenarios), but they fail severely in case of fast motions, or difficult lighting such as high dynamic range or low light scenes. In this paper, we present the first state estimation pipeline that leverages the complementary advantages of these two sensors by fusing in a tightly-coupled manner events, standard frames, and inertial measurements. We show on the publicly available Event Camera Dataset that our hybrid pipeline leads to an accuracy improvement of 130% over event-only pipelines, and 85% over standard-frames-only visual-inertial systems, while still being computationally tractable. Furthermore, we use our pipeline to demonstrate - to the best of our knowledge - the first autonomous quadrotor flight using an event camera for state estimation, unlocking flight scenarios that were not reachable with traditional visual-inertial odometry, such as low-light environments and high-dynamic range scenes.Comment: 8 pages, 9 figures, 2 table

    Kimera: from SLAM to Spatial Perception with 3D Dynamic Scene Graphs

    Full text link
    Humans are able to form a complex mental model of the environment they move in. This mental model captures geometric and semantic aspects of the scene, describes the environment at multiple levels of abstractions (e.g., objects, rooms, buildings), includes static and dynamic entities and their relations (e.g., a person is in a room at a given time). In contrast, current robots' internal representations still provide a partial and fragmented understanding of the environment, either in the form of a sparse or dense set of geometric primitives (e.g., points, lines, planes, voxels) or as a collection of objects. This paper attempts to reduce the gap between robot and human perception by introducing a novel representation, a 3D Dynamic Scene Graph(DSG), that seamlessly captures metric and semantic aspects of a dynamic environment. A DSG is a layered graph where nodes represent spatial concepts at different levels of abstraction, and edges represent spatio-temporal relations among nodes. Our second contribution is Kimera, the first fully automatic method to build a DSG from visual-inertial data. Kimera includes state-of-the-art techniques for visual-inertial SLAM, metric-semantic 3D reconstruction, object localization, human pose and shape estimation, and scene parsing. Our third contribution is a comprehensive evaluation of Kimera in real-life datasets and photo-realistic simulations, including a newly released dataset, uHumans2, which simulates a collection of crowded indoor and outdoor scenes. Our evaluation shows that Kimera achieves state-of-the-art performance in visual-inertial SLAM, estimates an accurate 3D metric-semantic mesh model in real-time, and builds a DSG of a complex indoor environment with tens of objects and humans in minutes. Our final contribution shows how to use a DSG for real-time hierarchical semantic path-planning. The core modules in Kimera are open-source.Comment: 34 pages, 25 figures, 9 tables. arXiv admin note: text overlap with arXiv:2002.0628

    3D Spatial Perception with Real-Time Dense Metric-Semantic SLAM

    No full text
    3D Spatial Perception is the ability of an agent to perceive and understand the three-dimensional structure of its environment, including its position and orientation within that environment. This ability is essential for autonomous robots to navigate and interact with their surroundings, since it enables robots to perform a wide variety of tasks, such as obstacle avoidance, path planning, and object manipulation. To provide robots with a detailed and accurate representation of the surrounding environment, this thesis first proposes the use of a map representation that is geometrically dense, photometrically accurate, and semantically annotated. We define these maps as metric-semantic maps, and provide algorithms to build such maps in real-time. Metric-semantic maps allow both humans and robots to have a shared understanding of the scene, while providing the robot with sufficient information to localize, plan shortest paths, and avoid obstacles along the way. We then present a novel 3D representation that abstracts a dense metric-semantic map into higher-level concepts – such as rooms, corridors, and buildings – and also encodes static objects and dynamic entities. We define such representations as 3D Dynamic Scene Graphs (DSGs), and provide as well algorithms to build 3D DSGs. Finally, we show how these approaches can be combined to form a Spatial Perception Engine capable of building both metric-semantic maps and 3D DSGs from visual and inertial data. We also demonstrate the effectiveness of 3D DSGs for fast semantic path-planning queries, which can be used to direct robots using natural language commands. In addition to the algorithms presented in this thesis, we open-source our code and datasets for the research community to use and explore. We believe that the algorithms and resources provided in this thesis open up exciting new possibilities in the field of 3D spatial perception, and we hope to inspire further research in this area, with the ultimate goal of creating fully autonomous robots that are able to navigate and operate in complex environments.Ph.D

    Densifying Sparse VIO: a Mesh-based approach using Structural Regularities

    No full text
    The ideal vision system for an autonomous robot would not only provide the robot’s position and orientation (localization), but also an accurate and complete model of the scene (mapping). While localization information allows for controlling the robot, a map of the scene allows for collision-free navigation; combined, a robot can achieve full autonomy. Visual Inertial Odometry (VIO) algorithms have shown impressive localization results in recent years. Unfortunately, typical VIO algorithms use a point cloud to represent the scene, which is hardly usable for other tasks such as obstacle avoidance or path planning. In this work, we explore the possibility of generating a dense and consistent model of the scene by using a 3D mesh, while making use of structural regularities to improve both mesh and pose estimates. Our experimental results show that we can achieve a 26% more accurate pose estimates than state-of-the-art VIO algorithms when enforcing structural constraints, while also building a 3D mesh which provides a denser and more accurate map of the scene than a classical point cloud. We also show that our approach does not rely on assumptions about the scene and is general enough to work when structural regularities are not present. --> The ideal vision system for an autonomous robot would not only provide the robot’s position and orientation (localization), but also an accurate and complete model of the scene (mapping). While localization information allows for controlling the robot, a map of the scene allows for collision-free navigation; combined, a robot can achieve full autonomy. Visual Inertial Odometry (VIO) algorithms have shown impressive localization results in recent years. Unfortunately, typical VIO algorithms use a point cloud to represent the scene, which is hardly usable for other tasks such as obstacle avoidance or path planning. In this work, we explore the possibility of generating a dense and consistent model of the scene by using a 3D mesh, while making use of structural regularities to improve both mesh and pose estimates. Our experimental results show that we can achieve a 26\% more accurate pose estimates than state-of-the-art VIO algorithms when enforcing structural constraints, while also building a 3D mesh which provides a denser and more accurate map of the scene than a classical point cloud. We also show that our approach does not rely on assumptions about the scene and is general enough to work when structural regularities are not present

    Smooth Mesh Estimation from Depth Data using Non-Smooth Convex Optimization

    No full text

    Incremental Visual-Inertial 3D Mesh Generation with Structural Regularities

    No full text
    © 2019 IEEE. Visual-Inertial Odometry (VIO) algorithms typically rely on a point cloud representation of the scene that does not model the topology of the environment. A 3D mesh instead offers a richer, yet lightweight, model. Nevertheless, building a 3D mesh out of the sparse and noisy 3D landmarks triangulated by a VIO algorithm often results in a mesh that does not fit the real scene. In order to regularize the mesh, previous approaches decouple state estimation from the 3D mesh regularization step, and either limit the 3D mesh to the current frame [1], [2] or let the mesh grow indefinitely [3], [4]. We propose instead to tightly couple mesh regularization and state estimation by detecting and enforcing structural regularities in a novel factor-graph formulation. We also propose to incrementally build the mesh by restricting its extent to the time-horizon of the VIO optimization; the resulting 3D mesh covers a larger portion of the scene than a per-frame approach while its memory usage and computational complexity remain bounded. We show that our approach successfully regularizes the mesh, while improving localization accuracy, when structural regularities are present, and remains operational in scenes without regularities.ARL (Award W911NF-17-2-0181

    Incremental visual-inertial 3d mesh generation with structural regularities

    No full text
    Visual-Inertial Odometry (VIO) algorithms typically rely on a point cloud representation of the scene that does not model the topology of the environment. A 3D mesh instead offers a richer, yet lightweight, model. Nevertheless, building a 3D mesh out of the sparse and noisy 3D landmarks triangulated by a VIO algorithm often results in a mesh that does not fit the real scene. In order to regularize the mesh, previous approaches decouple state estimation from the 3D mesh regularization step, and either limit the 3D mesh to the current frame [1], [2] or let the mesh grow indefinitely [3], [4]. We propose instead to tightly couple mesh regularization and state estimation by detecting and enforcing structural regularities in a novel factor-graph formulation. We also propose to incrementally build the mesh by restricting its extent to the time-horizon of the VIO optimization; the resulting 3D mesh covers a larger portion of the scene than a per-frame approach while its memory usage and computational complexity remain bounded. We show that our approach successfully regularizes the mesh, while improving localization accuracy, when structural regularities are present, and remains operational in scenes without regularities

    Kimera: An Open-Source Library for Real-Time Metric-Semantic Localization and Mapping

    No full text
    © 2020 IEEE. We provide an open-source C++ library for real-time metric-semantic visual-inertial Simultaneous Localization And Mapping (SLAM). The library goes beyond existing visual and visual-inertial SLAM libraries (e.g., ORB-SLAM, VINS-Mono, OKVIS, ROVIO) by enabling mesh reconstruction and semantic labeling in 3D. Kimera is designed with modularity in mind and has four key components: a visual-inertial odometry (VIO) module for fast and accurate state estimation, a robust pose graph optimizer for global trajectory estimation, a lightweight 3D mesher module for fast mesh reconstruction, and a dense 3D metric-semantic reconstruction module. The modules can be run in isolation or in combination, hence Kimera can easily fall back to a state-of-the-art VIO or a full SLAM system. Kimera runs in real-time on a CPU and produces a 3D metric-semantic mesh from semantically labeled images, which can be obtained by modern deep learning methods. We hope that the flexibility, computational efficiency, robustness, and accuracy afforded by Kimera will build a solid basis for future metric-semantic SLAM and perception research, and will allow researchers across multiple areas (e.g., VIO, SLAM, 3D reconstruction, segmentation) to benchmark and prototype their own efforts without having to start from scratch.ARL (Award W911NF-17-2-0181
    corecore