9 research outputs found

    Enhanced 3D Capture for Room-sized Dynamic Scenes with Commodity Depth Cameras

    Get PDF
    3D reconstruction of dynamic scenes can find many applications in areas such as virtual/augmented reality, 3D telepresence and 3D animation, while it is challenging to achieve a complete and high quality reconstruction due to the sensor noise and occlusions in the scene. This dissertation demonstrates our efforts toward building a 3D capture system for room-sized dynamic environments. A key observation is that reconstruction insufficiency (e.g., incompleteness and noise) can be mitigated by accumulating data from multiple frames. In dynamic environments, dropouts in 3D reconstruction generally do not consistently appear in the same locations. Thus, accumulation of the captured 3D data over time can fill in the missing fragments. Reconstruction noise is reduced as well. The first piece of the system builds 3D models for room-scale static scenes with one hand-held depth sensor, where we use plane features, in addition to image salient points, for robust pairwise matching and bundle adjustment over the whole data sequence. In the second piece of the system, we designed a robust non-rigid matching algorithm that considers both dense point alignment and color similarity, so that the data sequence for a continuously deforming object captured by multiple depth sensors can be aligned together and fused into a high quality 3D model. We further extend this work for deformable object scanning with a single depth sensor. To deal with the drift problem, we designed a dense nonrigid bundle adjustment algorithm to simultaneously optimize for the final mesh and the deformation parameters of every frame. Finally, we integrate static scanning and nonrigid matching into a reconstruction system for room-sized dynamic environments, where we prescan the static parts of the scene and perform data accumulation for dynamic parts. Both rigid and nonrigid motions of objects are tracked in a unified framework, and close contacts between objects are also handled. The dissertation demonstrates significant improvements for dense reconstruction over state-of-the-art. Our plane-based scanning system for indoor environments delivers reliable reconstruction for challenging situations, such as lack of both visual and geometrical salient features. Our nonrigid alignment algorithm enables data fusion for deforming objects and thus achieves dramatically enhanced reconstruction. Our novel bundle adjustment algorithm handles dense input partial scans with nonrigid motion and outputs dense reconstruction with comparably high quality as the static scanning algorithm (e.g., KinectFusion). Finally, we demonstrate enhanced reconstruction results for room-sized dynamic environments by integrating the above techniques, which significantly advances state-of-the-art.Doctor of Philosoph

    ActiveStereoNet: End-to-End Self-Supervised Learning for Active Stereo Systems

    Full text link
    In this paper we present ActiveStereoNet, the first deep learning solution for active stereo systems. Due to the lack of ground truth, our method is fully self-supervised, yet it produces precise depth with a subpixel precision of 1/30th1/30th of a pixel; it does not suffer from the common over-smoothing issues; it preserves the edges; and it explicitly handles occlusions. We introduce a novel reconstruction loss that is more robust to noise and texture-less patches, and is invariant to illumination changes. The proposed loss is optimized using a window-based cost aggregation with an adaptive support weight scheme. This cost aggregation is edge-preserving and smooths the loss function, which is key to allow the network to reach compelling results. Finally we show how the task of predicting invalid regions, such as occlusions, can be trained end-to-end without ground-truth. This component is crucial to reduce blur and particularly improves predictions along depth discontinuities. Extensive quantitatively and qualitatively evaluations on real and synthetic data demonstrate state of the art results in many challenging scenes.Comment: Accepted by ECCV2018, Oral Presentation, Main paper + Supplementary Material

    Spectral Graphormer: Spectral Graph-based Transformer for Egocentric Two-Hand Reconstruction using Multi-View Color Images

    Full text link
    We propose a novel transformer-based framework that reconstructs two high fidelity hands from multi-view RGB images. Unlike existing hand pose estimation methods, where one typically trains a deep network to regress hand model parameters from single RGB image, we consider a more challenging problem setting where we directly regress the absolute root poses of two-hands with extended forearm at high resolution from egocentric view. As existing datasets are either infeasible for egocentric viewpoints or lack background variations, we create a large-scale synthetic dataset with diverse scenarios and collect a real dataset from multi-calibrated camera setup to verify our proposed multi-view image feature fusion strategy. To make the reconstruction physically plausible, we propose two strategies: (i) a coarse-to-fine spectral graph convolution decoder to smoothen the meshes during upsampling and (ii) an optimisation-based refinement stage at inference to prevent self-penetrations. Through extensive quantitative and qualitative evaluations, we show that our framework is able to produce realistic two-hand reconstructions and demonstrate the generalisation of synthetic-trained models to real data, as well as real-time AR/VR applications.Comment: Accepted to ICCV 202

    Fusion4D: Real-time Performance Capture of Challenging Scenes

    Get PDF
    We contribute a new pipeline for live multi-view performance capture, generating temporally coherent high-quality reconstructions in real-time. Our algorithm supports both incremental reconstruction, improving the surface estimation over time, as well as parameterizing the nonrigid scene motion. Our approach is highly robust to both large frame-to-frame motion and topology changes, allowing us to reconstruct extremely challenging scenes. We demonstrate advantages over related real-time techniques that either deform an online generated template or continually fuse depth data nonrigidly into a single reference model. Finally, we show geometric reconstruction results on par with offline methods which require orders of magnitude more processing time and many more RGBD cameras

    Spectral Graphormer:Spectral Graph-based Transformer for Egocentric Two-Hand Reconstruction using Multi-View Color Images

    Get PDF
    We propose a novel transformer-based framework that reconstructs two high fidelity hands from multi-view RGB images. Unlike existing hand pose estimation methods, where one typically trains a deep network to regress hand model parameters from single RGB image, we consider a more challenging problem setting where we directly regress the absolute root poses of two-hands with extended forearm at high resolution from egocentric view. As existing datasets are either infeasible for egocentric viewpoints or lack background variations, we create a large-scale synthetic dataset with diverse scenarios and collect a real dataset from multi-calibrated camera setup to verify our proposed multi-view image feature fusion strategy. To make the reconstruction physically plausible, we propose two strategies: (i) a coarse-to-fine spectral graph convolution decoder to smoothen the meshes during upsampling and (ii) an optimisation-based refinement stage at inference to prevent self-penetrations. Through extensive quantitative and qualitative evaluations, we show that our framework is able to produce realistic two-hand reconstructions and demonstrate the generalisation of synthetic-trained models to real data, as well as real-time AR/VR applications
    corecore