176 research outputs found
Incremental Visual-Inertial 3D Mesh Generation with Structural Regularities
Visual-Inertial Odometry (VIO) algorithms typically rely on a point cloud
representation of the scene that does not model the topology of the
environment. A 3D mesh instead offers a richer, yet lightweight, model.
Nevertheless, building a 3D mesh out of the sparse and noisy 3D landmarks
triangulated by a VIO algorithm often results in a mesh that does not fit the
real scene. In order to regularize the mesh, previous approaches decouple state
estimation from the 3D mesh regularization step, and either limit the 3D mesh
to the current frame or let the mesh grow indefinitely. We propose instead to
tightly couple mesh regularization and state estimation by detecting and
enforcing structural regularities in a novel factor-graph formulation. We also
propose to incrementally build the mesh by restricting its extent to the
time-horizon of the VIO optimization; the resulting 3D mesh covers a larger
portion of the scene than a per-frame approach while its memory usage and
computational complexity remain bounded. We show that our approach successfully
regularizes the mesh, while improving localization accuracy, when structural
regularities are present, and remains operational in scenes without
regularities.Comment: 7 pages, 5 figures, ICRA accepte
Kimera: from SLAM to Spatial Perception with 3D Dynamic Scene Graphs
Humans are able to form a complex mental model of the environment they move
in. This mental model captures geometric and semantic aspects of the scene,
describes the environment at multiple levels of abstractions (e.g., objects,
rooms, buildings), includes static and dynamic entities and their relations
(e.g., a person is in a room at a given time). In contrast, current robots'
internal representations still provide a partial and fragmented understanding
of the environment, either in the form of a sparse or dense set of geometric
primitives (e.g., points, lines, planes, voxels) or as a collection of objects.
This paper attempts to reduce the gap between robot and human perception by
introducing a novel representation, a 3D Dynamic Scene Graph(DSG), that
seamlessly captures metric and semantic aspects of a dynamic environment. A DSG
is a layered graph where nodes represent spatial concepts at different levels
of abstraction, and edges represent spatio-temporal relations among nodes. Our
second contribution is Kimera, the first fully automatic method to build a DSG
from visual-inertial data. Kimera includes state-of-the-art techniques for
visual-inertial SLAM, metric-semantic 3D reconstruction, object localization,
human pose and shape estimation, and scene parsing. Our third contribution is a
comprehensive evaluation of Kimera in real-life datasets and photo-realistic
simulations, including a newly released dataset, uHumans2, which simulates a
collection of crowded indoor and outdoor scenes. Our evaluation shows that
Kimera achieves state-of-the-art performance in visual-inertial SLAM, estimates
an accurate 3D metric-semantic mesh model in real-time, and builds a DSG of a
complex indoor environment with tens of objects and humans in minutes. Our
final contribution shows how to use a DSG for real-time hierarchical semantic
path-planning. The core modules in Kimera are open-source.Comment: 34 pages, 25 figures, 9 tables. arXiv admin note: text overlap with
arXiv:2002.0628
Kimera-Multi: Robust, Distributed, Dense Metric-Semantic SLAM for Multi-Robot Systems
This paper presents Kimera-Multi, the first multi-robot system that (i) is
robust and capable of identifying and rejecting incorrect inter and intra-robot
loop closures resulting from perceptual aliasing, (ii) is fully distributed and
only relies on local (peer-to-peer) communication to achieve distributed
localization and mapping, and (iii) builds a globally consistent
metric-semantic 3D mesh model of the environment in real-time, where faces of
the mesh are annotated with semantic labels. Kimera-Multi is implemented by a
team of robots equipped with visual-inertial sensors. Each robot builds a local
trajectory estimate and a local mesh using Kimera. When communication is
available, robots initiate a distributed place recognition and robust pose
graph optimization protocol based on a novel distributed graduated
non-convexity algorithm. The proposed protocol allows the robots to improve
their local trajectory estimates by leveraging inter-robot loop closures while
being robust to outliers. Finally, each robot uses its improved trajectory
estimate to correct the local mesh using mesh deformation techniques.
We demonstrate Kimera-Multi in photo-realistic simulations, SLAM benchmarking
datasets, and challenging outdoor datasets collected using ground robots. Both
real and simulated experiments involve long trajectories (e.g., up to 800
meters per robot). The experiments show that Kimera-Multi (i) outperforms the
state of the art in terms of robustness and accuracy, (ii) achieves estimation
errors comparable to a centralized SLAM system while being fully distributed,
(iii) is parsimonious in terms of communication bandwidth, (iv) produces
accurate metric-semantic 3D meshes, and (v) is modular and can be also used for
standard 3D reconstruction (i.e., without semantic labels) or for trajectory
estimation (i.e., without reconstructing a 3D mesh).Comment: Accepted by IEEE Transactions on Robotics (18 pages, 15 figures
A survey on real-time 3D scene reconstruction with SLAM methods in embedded systems
The 3D reconstruction of simultaneous localization and mapping (SLAM) is an
important topic in the field for transport systems such as drones, service
robots and mobile AR/VR devices. Compared to a point cloud representation, the
3D reconstruction based on meshes and voxels is particularly useful for
high-level functions, like obstacle avoidance or interaction with the physical
environment. This article reviews the implementation of a visual-based 3D scene
reconstruction pipeline on resource-constrained hardware platforms. Real-time
performances, memory management and low power consumption are critical for
embedded systems. A conventional SLAM pipeline from sensors to 3D
reconstruction is described, including the potential use of deep learning. The
implementation of advanced functions with limited resources is detailed. Recent
systems propose the embedded implementation of 3D reconstruction methods with
different granularities. The trade-off between required accuracy and resource
consumption for real-time localization and reconstruction is one of the open
research questions identified and discussed in this paper
Learning meshes for dense visual SLAM
Estimating motion and surrounding geometry of a moving camera remains a challenging inference problem. From an information theoretic point of view, estimates should get better as more information is included, such as is done in dense SLAM, but this is strongly dependent on the validity of the underlying models. In the present paper, we use triangular meshes as both compact and dense geometry representation. To allow for simple and fast usage, we propose a view-based formulation for which we predict the in-plane vertex coordinates directly from images and then employ the remaining vertex depth components as free variables. Flexible and continuous integration of information is achieved through the use of a residual based inference technique. This so-called factor graph encodes all information as mapping from free variables to residuals, the squared sum of which is minimised during inference. We propose the use of different types of learnable residuals, which are trained end-to-end to increase their suitability as information bearing models and to enable accurate and reliable estimation. Detailed evaluation of all components is provided on both synthetic and real data which confirms the practicability of the presented approach
Semantic Validation in Structure from Motion
The Structure from Motion (SfM) challenge in computer vision is the process
of recovering the 3D structure of a scene from a series of projective
measurements that are calculated from a collection of 2D images, taken from
different perspectives. SfM consists of three main steps; feature detection and
matching, camera motion estimation, and recovery of 3D structure from estimated
intrinsic and extrinsic parameters and features.
A problem encountered in SfM is that scenes lacking texture or with
repetitive features can cause erroneous feature matching between frames.
Semantic segmentation offers a route to validate and correct SfM models by
labelling pixels in the input images with the use of a deep convolutional
neural network. The semantic and geometric properties associated with classes
in the scene can be taken advantage of to apply prior constraints to each class
of object. The SfM pipeline COLMAP and semantic segmentation pipeline DeepLab
were used. This, along with planar reconstruction of the dense model, were used
to determine erroneous points that may be occluded from the calculated camera
position, given the semantic label, and thus prior constraint of the
reconstructed plane. Herein, semantic segmentation is integrated into SfM to
apply priors on the 3D point cloud, given the object detection in the 2D input
images. Additionally, the semantic labels of matched keypoints are compared and
inconsistent semantically labelled points discarded. Furthermore, semantic
labels on input images are used for the removal of objects associated with
motion in the output SfM models. The proposed approach is evaluated on a
data-set of 1102 images of a repetitive architecture scene. This project offers
a novel method for improved validation of 3D SfM models
Autonomous exploration of hierarchical scene graphs
L'exploració robòtica autònoma és un camp de recerca actiu, on els mètodes de percepció robòtica hi abunden. Els mètodes basats en grafs, en particular, són una manera de representar l'entorn de forma eficient, i ofereixen una base sobre la que raonar a alt nivell per resoldre tasques de l'àmbit de la robòtica.
Proposem un sistema per generar grafs jeràrquics d'escena automàticament a partir d'entorns foto-realistes. En aquest treball emprem un mètode de percepció basat en grafs, Hydra, en combinació amb un simulador 3D anomenat Habitat-Sim, per explorar i generar representacions en forma de grafs d'escena 3D dels entorns tridimensionals simulats. Aquest sistema i les dades que n'han derivat ens donen una base sobre la que establim un mètode general per resoldre tasques d'exploració en entorns tridimensionals mitjançant Xarxes Neuronals per a Grafs i Aprenentatge per Reforç.La exploración robótica autónoma es un campo de investigación activo, donde los métodos de percepción robótica abundan. Los métodos basados en grafos, en particular, son una forma de representar el entorno de forma eficiente, y ofrecen una base sobre la que razonar a alto nivel para resolver tareas del ámbito de la robótica.
Proponemos un sistema para generar grafos jerárquicos de escena automáticamente a partir de entornos fotorealistas. En este trabajo usamos un método de percepción basado en grafos, Hydra, en combinación con un simulador 3D llamado Habitat-Sim, para explorar y generar representaciones en forma de grafos de escena 3D de los entornos tridimensionales simulados. Este sistema y los datos que han derivado de él nos dan una base sobre la que establecemos un método general para resolver tareas de exploración en entornos tridimensionales mediante Redes Neuronales para Grafos y Aprendizaje por Refuerzo.Robotic autonomous exploration is an active field of research, where robot perception pipelines abound. Graph-based pipelines, in particular, are a way to represent the environment efficiently, and provide grounds for reasoning on a high level to solve robotics tasks.
We propose a framework to generate hierarchical scene graphs automatically from photo-realistic environments. In this thesis, a graph perception pipeline, Hydra, is employed in combination with Habitat-Sim, a 3D simulator, to explore and generate 3D scene graph representations from the simulated 3D maps. This framework and data have provided the grounds to establish a general pipeline for solving exploration tasks in 3D environments using Graph Neural Networks and Reinforcement Learning.Outgoin
- …