8 research outputs found

    From scene flow to visual odometry through local and global regularisation in markov random fields

    Get PDF
    We revisit pairwise Markov Random Field (MRF) formulations for RGB-D scene flow and leverage novel advances in processor design for real-time implementations. We consider scene flow approaches which consist of data terms enforcing intensity consistency between consecutive images, together with regularisation terms which impose smoothness over the flow field. To achieve real-time operation, previous systems leveraged GPUs and implemented regularisation only between variables corresponding to neighbouring pixels. Such systems could estimate continuously deforming flow fields but the lack of global regularisation over the whole field made them ineffective for visual odometry. We leverage the GraphCore Intelligence Processing Unit (IPU) graph processor chip, which consists of 1216 independent cores called tiles, each with 256 kB local memory. The tiles are connected to an ultrafast all-to-all communication fabric which enables efficient data transmission between the tiles in an arbitrary communication pattern. We propose a distributed formulation for dense RGB-D scene flow based on Gaussian Belief Propagation which leverages the architecture of this processor to implement both local and global regularisation. Local regularisation is enforced for pairs of flow estimates whose corresponding pixels are neighbours, while global regularisation is defined for flow estimate pairs whose corresponding pixels are far from each other on the image plane. Using both types of regularisation allows our algorithm to handle a variety of in-scene motion and makes it suitable for estimating deforming scene flow, piece-wise rigid scene flow and visual odometry within the same system

    Overlap-based ICP Tuning for Robust Localization of a Humanoid Robot

    Get PDF

    CodeMapping: real-time dense mapping for sparse SLAM using compact scene representations

    Get PDF
    We propose a novel dense mapping framework for sparse visual SLAM systems which leverages a compact scene representation. State-of-the-art sparse visual SLAM systems provide accurate and reliable estimates of the camera trajectory and locations of landmarks. While these sparse maps are useful for localization, they cannot be used for other tasks such as obstacle avoidance or scene understanding. In this letter we propose a dense mapping framework to complement sparse visual SLAM systems which takes as input the camera poses, keyframes and sparse points produced by the SLAM system and predicts a dense depth image for every keyframe. We build on CodeSLAM [1] and use a variational autoencoder (VAE) which is conditioned on intensity, sparse depth and reprojection error images from sparse SLAM to predict an uncertainty-aware dense depth map. The use of a VAE then enables us to refine the dense depth images through multi-view optimization which improves the consistency of overlapping frames. Our mapper runs in a separate thread in parallel to the SLAM system in a loosely coupled manner. This flexible design allows for integration with arbitrary metric sparse SLAM systems without delaying the main SLAM process. Our dense mapper can be used not only for local mapping but also globally consistent dense 3D reconstruction through TSDF fusion. We demonstrate our system running with ORB-SLAM3 and show accurate dense depth estimation which could enable applications such as robotics and augmented reality

    StaticFusion: background reconstruction for dense RGB-D SLAM in dynamic environments

    No full text
    Dynamic environments are challenging for visual SLAM as moving objects can impair camera pose tracking and cause corruptions to be integrated into the map. In this paper, we propose a method for robust dense RGB-D SLAM in dynamic environments which detects moving objects and simultaneously reconstructs the background structure. While most methods employ implicit robust penalisers or outlier filtering techniques in order to handle moving objects, our approach is to simultaneously estimate the camera motion as well as a probabilistic static/dynamic segmentation of the current RGB-D image pair. This segmentation is then used for weighted dense RGB-D fusion to estimate a 3D model of only the static parts of the environment. By leveraging the 3D model for frame-to-model alignment, as well as static/dynamic segmentation, camera motion estimation has reduced overall drift - as well as being more robust to the presence of dynamics in the scene. Demonstrations are presented which compare the proposed method to related state-of-the-art approaches using both static and dynamic sequences. The proposed method achieves similar performance in static environments and improved accuracy and robustness in dynamic scenes

    StaticFusion:Background Reconstruction for Dense RGB-D SLAM in Dynamic Environments

    No full text
    Dynamic environments are challenging for visual SLAM as moving objects can impair camera pose tracking and cause corruptions to be integrated into the map. In this paper, we propose a method for robust dense RGB-D SLAM in dynamic environments which detects moving objects and simultaneously reconstructs the background structure. While most methods employ implicit robust penalisers or outlier filtering techniques in order to handle moving objects, our approach is to simultaneously estimate the camera motion as well as a probabilistic static/dynamic segmentation of the current RGB-D image pair. This segmentation is then used for weighted dense RGB-D fusion to estimate a 3D model of only the static parts of the environment. By leveraging the 3D model for frame-to-model alignment, as well as static/dynamic segmentation, camera motion estimation has reduced overall drift - as well as being more robust to the presence of dynamics in the scene. Demonstrations are presented which compare the proposed method to related state-of-the-art approaches using both static and dynamic sequences. The proposed method achieves similar performance in static environments and improved accuracy and robustness in dynamic scenes

    SIMstack: a generative shape and instance model for unordered object stacks

    Get PDF
    By estimating 3D shape and instances from a single view, we can capture information about an environment quickly, without the need for comprehensive scanning and multi-view fusion. Solving this task for composite scenes (such as object stacks) is challenging: occluded areas are not only ambiguous in shape but also in instance segmentation; multiple decompositions could be valid. We observe that physics constrains decomposition as well as shape in occluded regions and hypothesise that a latent space learned from scenes built under physics simulation can serve as a prior to better predict shape and instances in occluded regions. To this end we propose SIMstack, a depth-conditioned Variational Auto-Encoder (VAE), trained on a dataset of objects stacked under physics simulation. We formulate instance segmentation as a centre voting task which allows for class-agnostic detection and doesn’t require setting the maximum number of objects in the scene. At test time, our model can generate 3D shape and instance segmentation from a single depth view, probabilistically sampling proposals for the occluded region from the learned latent space. Our method has practical applications in providing robots some of the ability humans have to make rapid intuitive inferences of partially observed scenes. We demonstrate an application for precise (non-disruptive) object grasping of unknown objects from a single depth view

    Robust Underwater Visual SLAM Fusing Acoustic Sensing

    No full text
    corecore