839 research outputs found

    Egomotion estimation using binocular spatiotemporal oriented energy

    Get PDF
    Camera egomotion estimation is concerned with the recovery of a camera's motion (e.g., instantaneous translation and rotation) as it moves through its environment. It has been demonstrated to be of both theoretical and practical interest. This thesis documents a novel algorithm for egomotion estimation based on binocularly matched spatiotemporal oriented energy distributions. Basing the estimation on oriented energy measurements makes it possible to recover egomotion without the need to establish temporal correspondences or convert disparity into 3D world coordinates. There sulting algorithm has been realized in software and evaluated quantitatively on a novel laboratory dataset with ground truth as well as qualitatively on both indoor and outdoor real-world datasets. Performance is evaluated relative to comparable alternative algorithms and shown to exhibit best overall performance

    Point-to-hyperplane RGB-D Pose Estimation: Fusing Photometric and Geometric Measurements

    Get PDF
    International audienceThe objective of this paper is to investigate the problem of how to best combine and fuse color and depth measurements for incremental pose estimation or 3D tracking. Subsequently a framework will be proposed that allows to formulate the problem with a unique measurement vector and not to combine them in an ad-hoc manner. In particular, the full color and depth measurement will be defined as a 4-vector (by combining 3D Euclidean points + image intensities) and an optimal error for pose estimation will be derived from this. As will be shown, this will lead to designing an iterative closest point approach in 4 dimensional space. A kd-tree is used to find the closest point in 4D-space, therefore simultaneously accounting for color and depth. Based on this unified framework a novel Point-to-hyperplane approach will be introduced which has the advantages of classic Point-to-plane ICP but in 4D-space. By doing this it will be shown that there is no longer any need to provide or estimate a scale factor between different measurement types. Consequently, this allows to increase the convergence domain and speed up the alignment, whilst maintaining the robust and accurate properties. Results on both simulated and real environments will be provided along with benchmark comparisons

    Efficient Online Surface Correction for Real-time Large-Scale 3D Reconstruction

    Full text link
    State-of-the-art methods for large-scale 3D reconstruction from RGB-D sensors usually reduce drift in camera tracking by globally optimizing the estimated camera poses in real-time without simultaneously updating the reconstructed surface on pose changes. We propose an efficient on-the-fly surface correction method for globally consistent dense 3D reconstruction of large-scale scenes. Our approach uses a dense Visual RGB-D SLAM system that estimates the camera motion in real-time on a CPU and refines it in a global pose graph optimization. Consecutive RGB-D frames are locally fused into keyframes, which are incorporated into a sparse voxel hashed Signed Distance Field (SDF) on the GPU. On pose graph updates, the SDF volume is corrected on-the-fly using a novel keyframe re-integration strategy with reduced GPU-host streaming. We demonstrate in an extensive quantitative evaluation that our method is up to 93% more runtime efficient compared to the state-of-the-art and requires significantly less memory, with only negligible loss of surface quality. Overall, our system requires only a single GPU and allows for real-time surface correction of large environments.Comment: British Machine Vision Conference (BMVC), London, September 201

    Efficient Online Surface Correction for Real-time Large-Scale 3D Reconstruction

    Full text link
    State-of-the-art methods for large-scale 3D reconstruction from RGB-D sensors usually reduce drift in camera tracking by globally optimizing the estimated camera poses in real-time without simultaneously updating the reconstructed surface on pose changes. We propose an efficient on-the-fly surface correction method for globally consistent dense 3D reconstruction of large-scale scenes. Our approach uses a dense Visual RGB-D SLAM system that estimates the camera motion in real-time on a CPU and refines it in a global pose graph optimization. Consecutive RGB-D frames are locally fused into keyframes, which are incorporated into a sparse voxel hashed Signed Distance Field (SDF) on the GPU. On pose graph updates, the SDF volume is corrected on-the-fly using a novel keyframe re-integration strategy with reduced GPU-host streaming. We demonstrate in an extensive quantitative evaluation that our method is up to 93% more runtime efficient compared to the state-of-the-art and requires significantly less memory, with only negligible loss of surface quality. Overall, our system requires only a single GPU and allows for real-time surface correction of large environments.Comment: British Machine Vision Conference (BMVC), London, September 201

    Novel Camera Architectures for Localization and Mapping on Intelligent Mobile Platforms

    Get PDF
    Self-localization and environment mapping play a very important role in many robotics application such as autonomous driving and mixed reality consumer products. Although the most powerful solutions rely on a multitude of sensors including lidars and camera, the community maintains a high interest in developing cost-effective, purely vision-based localization and mapping approaches. The core problem of standard vision-only solutions is accuracy and robustness, especially in challenging visual conditions. The thesis aims to introduce new solutions to localization and mapping problems on intelligent mobile devices by taking advantages of novel camera architectures. The thesis investigates on using surround-view multi-camera systems, which combine the benefits of omni-directional measurements with a sufficient baseline for producing measurements in metric scale, and event cameras, that perform well under challenging illumination conditions and have high temporal resolutions. The thesis starts by looking into the motion estimation framework with multi-perspective camera systems. The framework could be divided into two sub-parts, a front-end module that initializes motion and estimates absolute pose after bootstrapping, and a back-end module that refines the estimate over a larger-scale sequence. First, the thesis proposes a complete real-time pipeline for visual odometry with non-overlapping, multi-perspective camera systems, and in particular presents a solution to the scale initialization problem, in order to solve the unobservability of metric scale under degenerate cases with such systems. Second, the thesis focuses on the further improvement of front-end relative pose estimation for vehicle-mounted surround-view multi-camera systems. It presents a new, reliable solution able to handle all kinds of relative displacements in the plane despite the possibly non-holonomic characteristics, and furthermore introduces a novel two-view optimization scheme which minimizes a geometrically relevant error without relying on 3D points related optimization variables. Third, the thesis explores the continues-time parametrization for exact modelling of non-holonomic ground vehicle trajectories in the back-end optimization of visual SLAM pipeline. It demonstrates the use of B-splines for an exact imposition of smooth, non-holonomic trajectories inside the 6 DoF bundle adjustment, and show that a significant improvement in robustness and accuracy in degrading visual conditions can be achieved. In order to deal with challenges in scenarios with high dynamics, low texture distinctiveness, or challenging illumination conditions, the thesis focuses on the solution to localization and mapping problem on Autonomous Ground Vehicle(AGV) using event cameras. Inspired by the time-continuous parametrizations of image warping functions introduced by previous works, the thesis proposes two new algorithms to tackle several motion estimation problems by performing contrast maximization approach. It firstly looks at the fronto-parallel motion estimation of an event camera, in stark contrast to the prior art, a globally optimal solution to this motion estimation problem is derived by using a branch-and-bound optimization scheme. Then, the thesis introduces a new solution to handle the localization and mapping problem of single event camera by continuous ray warping and volumetric contrast maximization, which can perform joint optimization over motion and structure for cameras exerting both translational and rotational displacements in an arbitrarily structured environment. The present thesis thus makes important contributions on both front-end and back-end of SLAM pipelines based on novel, promising camera architectures


    Get PDF
    Geometric reconstruction of dynamic objects is a fundamental task of computer vision and graphics, and modeling human body of high fidelity is considered to be a core of this problem. Traditional human shape and motion capture techniques require an array of surrounding cameras or subjects wear reflective markers, resulting in a limitation of working space and portability. In this dissertation, a complete process is designed from geometric modeling detailed 3D human full body and capturing shape dynamics over time using a flexible setup to guiding clothes/person re-targeting with such data-driven models. As the mechanical movement of human body can be considered as an articulate motion, which is easy to guide the skin animation but has difficulties in the reverse process to find parameters from images without manual intervention, we present a novel parametric model, GMM-BlendSCAPE, jointly taking both linear skinning model and the prior art of BlendSCAPE (Blend Shape Completion and Animation for PEople) into consideration and develop a Gaussian Mixture Model (GMM) to infer both body shape and pose from incomplete observations. We show the increased accuracy of joints and skin surface estimation using our model compared to the skeleton based motion tracking. To model the detailed body, we start with capturing high-quality partial 3D scans by using a single-view commercial depth camera. Based on GMM-BlendSCAPE, we can then reconstruct multiple complete static models of large pose difference via our novel non-rigid registration algorithm. With vertex correspondences established, these models can be further converted into a personalized drivable template and used for robust pose tracking in a similar GMM framework. Moreover, we design a general purpose real-time non-rigid deformation algorithm to accelerate this registration. Last but not least, we demonstrate a novel virtual clothes try-on application based on our personalized model utilizing both image and depth cues to synthesize and re-target clothes for single-view videos of different people

    The Spatio-Temporal Poisson Point Process: A Simple Model for the Alignment of Event Camera Data

    Full text link
    Event cameras, inspired by biological vision systems, provide a natural and data efficient representation of visual information. Visual information is acquired in the form of events that are triggered by local brightness changes. Each pixel location of the camera's sensor records events asynchronously and independently with very high temporal resolution. However, because most brightness changes are triggered by relative motion of the camera and the scene, the events recorded at a single sensor location seldom correspond to the same world point. To extract meaningful information from event cameras, it is helpful to register events that were triggered by the same underlying world point. In this work we propose a new model of event data that captures its natural spatio-temporal structure. We start by developing a model for aligned event data. That is, we develop a model for the data as though it has been perfectly registered already. In particular, we model the aligned data as a spatio-temporal Poisson point process. Based on this model, we develop a maximum likelihood approach to registering events that are not yet aligned. That is, we find transformations of the observed events that make them as likely as possible under our model. In particular we extract the camera rotation that leads to the best event alignment. We show new state of the art accuracy for rotational velocity estimation on the DAVIS 240C dataset. In addition, our method is also faster and has lower computational complexity than several competing methods
    • …