5 research outputs found
Probabilistic Global Scale Estimation for MonoSLAM Based on Generic Object Detection
This paper proposes a novel method to estimate the global scale of a 3D
reconstructed model within a Kalman filtering-based monocular SLAM algorithm.
Our Bayesian framework integrates height priors over the detected objects
belonging to a set of broad predefined classes, based on recent advances in
fast generic object detection. Each observation is produced on single frames,
so that we do not need a data association process along video frames. This is
because we associate the height priors with the image region sizes at image
places where map features projections fall within the object detection regions.
We present very promising results of this approach obtained on several
experiments with different object classes.Comment: Int. Workshop on Visual Odometry, CVPR, (July 2017
MoreFusion: Multi-object Reasoning for 6D Pose Estimation from Volumetric Fusion
Robots and other smart devices need efficient object-based scene
representations from their on-board vision systems to reason about contact,
physics and occlusion. Recognized precise object models will play an important
role alongside non-parametric reconstructions of unrecognized structures. We
present a system which can estimate the accurate poses of multiple known
objects in contact and occlusion from real-time, embodied multi-view vision.
Our approach makes 3D object pose proposals from single RGB-D views,
accumulates pose estimates and non-parametric occupancy information from
multiple views as the camera moves, and performs joint optimization to estimate
consistent, non-intersecting poses for multiple objects in contact.
We verify the accuracy and robustness of our approach experimentally on 2
object datasets: YCB-Video, and our own challenging Cluttered YCB-Video. We
demonstrate a real-time robotics application where a robot arm precisely and
orderly disassembles complicated piles of objects, using only on-board RGB-D
vision.Comment: 10 pages, 10 figures, IEEE Conference on Computer Vision and Pattern
Recognition (CVPR) 202
iSDF: Real-Time Neural Signed Distance Fields for Robot Perception
We present iSDF, a continual learning system for real-time signed distance
field (SDF) reconstruction. Given a stream of posed depth images from a moving
camera, it trains a randomly initialised neural network to map input 3D
coordinate to approximate signed distance. The model is self-supervised by
minimising a loss that bounds the predicted signed distance using the distance
to the closest sampled point in a batch of query points that are actively
sampled. In contrast to prior work based on voxel grids, our neural method is
able to provide adaptive levels of detail with plausible filling in of
partially observed regions and denoising of observations, all while having a
more compact representation. In evaluations against alternative methods on real
and synthetic datasets of indoor environments, we find that iSDF produces more
accurate reconstructions, and better approximations of collision costs and
gradients useful for downstream planners in domains from navigation to
manipulation. Code and video results can be found at our project page:
https://joeaortiz.github.io/iSDF/ .Comment: Project page: https://joeaortiz.github.io/iSDF
Neural scene representations for dense-semantic SLAM
An important challenge in visual Simultaneous Localisation and Mapping (SLAM) has been on the design of scene representations that allow for both robust inference and useful interaction. The rapid progression of semantic image understanding powered by deep learning has led to SLAM systems that enrich geometric maps with semantics, which increases the range of applications possible. However, a core challenge remains in how to tightly integrate geometry and semantics for 3D re- construction; we believe that their joint representation is the right direction for actionable and robust maps. In this thesis we will address the central question on designing efficient scene representations by the use of compressive models, which can represent detail with the least number of parameters. We then demonstrate that compressive models offer a solution for the joint representation of geometry and semantics, where semantics provide priors for robust reconstruction and geometric compression informs scene decomposition. This work focuses on using generative neural networks, a category of compressive representations, for incremental dense SLAM. We develop a volumetric rendering formulation for the use of compressive models in generative inference from multi- view images, enabling two novel SLAM systems. First, we learn class-level code descriptors for object shape from aligned 3D models. At test time, the code and object pose are optimised for efficient and complete object reconstruction from in- stances of the learned categories. This method relaxes the assumption of fixed tem- plates and allows for intra-class shape variation. We demonstrate the usefulness of semantic priors for complete and precise reconstruction in a robotic packing applic- ation. Second, we present a scene-specific multi-layered perceptron (MLP) neural field for full generative dense SLAM. Our results show that it allows for efficient mapping, automatic hole-filling, and joint optimisation of camera trajectory and 3D map. Last, we demonstrate that the MLP’s automatic scene compression discovers underlying scene structures that are revealed with sparse labeling.Open Acces