5 research outputs found

    Probabilistic Global Scale Estimation for MonoSLAM Based on Generic Object Detection

    Full text link
    This paper proposes a novel method to estimate the global scale of a 3D reconstructed model within a Kalman filtering-based monocular SLAM algorithm. Our Bayesian framework integrates height priors over the detected objects belonging to a set of broad predefined classes, based on recent advances in fast generic object detection. Each observation is produced on single frames, so that we do not need a data association process along video frames. This is because we associate the height priors with the image region sizes at image places where map features projections fall within the object detection regions. We present very promising results of this approach obtained on several experiments with different object classes.Comment: Int. Workshop on Visual Odometry, CVPR, (July 2017

    MoreFusion: Multi-object Reasoning for 6D Pose Estimation from Volumetric Fusion

    Full text link
    Robots and other smart devices need efficient object-based scene representations from their on-board vision systems to reason about contact, physics and occlusion. Recognized precise object models will play an important role alongside non-parametric reconstructions of unrecognized structures. We present a system which can estimate the accurate poses of multiple known objects in contact and occlusion from real-time, embodied multi-view vision. Our approach makes 3D object pose proposals from single RGB-D views, accumulates pose estimates and non-parametric occupancy information from multiple views as the camera moves, and performs joint optimization to estimate consistent, non-intersecting poses for multiple objects in contact. We verify the accuracy and robustness of our approach experimentally on 2 object datasets: YCB-Video, and our own challenging Cluttered YCB-Video. We demonstrate a real-time robotics application where a robot arm precisely and orderly disassembles complicated piles of objects, using only on-board RGB-D vision.Comment: 10 pages, 10 figures, IEEE Conference on Computer Vision and Pattern Recognition (CVPR) 202

    iSDF: Real-Time Neural Signed Distance Fields for Robot Perception

    Full text link
    We present iSDF, a continual learning system for real-time signed distance field (SDF) reconstruction. Given a stream of posed depth images from a moving camera, it trains a randomly initialised neural network to map input 3D coordinate to approximate signed distance. The model is self-supervised by minimising a loss that bounds the predicted signed distance using the distance to the closest sampled point in a batch of query points that are actively sampled. In contrast to prior work based on voxel grids, our neural method is able to provide adaptive levels of detail with plausible filling in of partially observed regions and denoising of observations, all while having a more compact representation. In evaluations against alternative methods on real and synthetic datasets of indoor environments, we find that iSDF produces more accurate reconstructions, and better approximations of collision costs and gradients useful for downstream planners in domains from navigation to manipulation. Code and video results can be found at our project page: https://joeaortiz.github.io/iSDF/ .Comment: Project page: https://joeaortiz.github.io/iSDF

    Neural scene representations for dense-semantic SLAM

    No full text
    An important challenge in visual Simultaneous Localisation and Mapping (SLAM) has been on the design of scene representations that allow for both robust inference and useful interaction. The rapid progression of semantic image understanding powered by deep learning has led to SLAM systems that enrich geometric maps with semantics, which increases the range of applications possible. However, a core challenge remains in how to tightly integrate geometry and semantics for 3D re- construction; we believe that their joint representation is the right direction for actionable and robust maps. In this thesis we will address the central question on designing efficient scene representations by the use of compressive models, which can represent detail with the least number of parameters. We then demonstrate that compressive models offer a solution for the joint representation of geometry and semantics, where semantics provide priors for robust reconstruction and geometric compression informs scene decomposition. This work focuses on using generative neural networks, a category of compressive representations, for incremental dense SLAM. We develop a volumetric rendering formulation for the use of compressive models in generative inference from multi- view images, enabling two novel SLAM systems. First, we learn class-level code descriptors for object shape from aligned 3D models. At test time, the code and object pose are optimised for efficient and complete object reconstruction from in- stances of the learned categories. This method relaxes the assumption of fixed tem- plates and allows for intra-class shape variation. We demonstrate the usefulness of semantic priors for complete and precise reconstruction in a robotic packing applic- ation. Second, we present a scene-specific multi-layered perceptron (MLP) neural field for full generative dense SLAM. Our results show that it allows for efficient mapping, automatic hole-filling, and joint optimisation of camera trajectory and 3D map. Last, we demonstrate that the MLP’s automatic scene compression discovers underlying scene structures that are revealed with sparse labeling.Open Acces
    corecore