53,781 research outputs found

    MonoPerfCap: Human Performance Capture from Monocular Video

    Full text link
    We present the first marker-less approach for temporally coherent 3D performance capture of a human with general clothing from monocular video. Our approach reconstructs articulated human skeleton motion as well as medium-scale non-rigid surface deformations in general scenes. Human performance capture is a challenging problem due to the large range of articulation, potentially fast motion, and considerable non-rigid deformations, even from multi-view data. Reconstruction from monocular video alone is drastically more challenging, since strong occlusions and the inherent depth ambiguity lead to a highly ill-posed reconstruction problem. We tackle these challenges by a novel approach that employs sparse 2D and 3D human pose detections from a convolutional neural network using a batch-based pose estimation strategy. Joint recovery of per-batch motion allows to resolve the ambiguities of the monocular reconstruction problem based on a low dimensional trajectory subspace. In addition, we propose refinement of the surface geometry based on fully automatically extracted silhouettes to enable medium-scale non-rigid alignment. We demonstrate state-of-the-art performance capture results that enable exciting applications such as video editing and free viewpoint video, previously infeasible from monocular video. Our qualitative and quantitative evaluation demonstrates that our approach significantly outperforms previous monocular methods in terms of accuracy, robustness and scene complexity that can be handled.Comment: Accepted to ACM TOG 2018, to be presented on SIGGRAPH 201

    MoFA: Model-based Deep Convolutional Face Autoencoder for Unsupervised Monocular Reconstruction

    Get PDF
    In this work we propose a novel model-based deep convolutional autoencoder that addresses the highly challenging problem of reconstructing a 3D human face from a single in-the-wild color image. To this end, we combine a convolutional encoder network with an expert-designed generative model that serves as decoder. The core innovation is our new differentiable parametric decoder that encapsulates image formation analytically based on a generative model. Our decoder takes as input a code vector with exactly defined semantic meaning that encodes detailed face pose, shape, expression, skin reflectance and scene illumination. Due to this new way of combining CNN-based with model-based face reconstruction, the CNN-based encoder learns to extract semantically meaningful parameters from a single monocular input image. For the first time, a CNN encoder and an expert-designed generative model can be trained end-to-end in an unsupervised manner, which renders training on very large (unlabeled) real world data feasible. The obtained reconstructions compare favorably to current state-of-the-art approaches in terms of quality and richness of representation.Comment: International Conference on Computer Vision (ICCV) 2017 (Oral), 13 page

    Triggering BTeV

    Get PDF
    BTeV is a collider experiment at Fermilab designed for precision studies of CP violation and mixing. Unlike most collider experiments, the BTeV detector has a forward geometry that is optimized for the measurement of B and charm decays in a high-rate environment. While the rate of B production gives BTeV an advantage of almost four orders of magnitude over e+e- B factories, the BTeV Level 1 trigger must be able to accept data at a rate of 100 Gigabytes per second, reconstruct tracks and vertices, trigger on B events with high efficiency, and reject minimum bias events by a factor of 100:1. An overview of the Level 1 trigger will be presented.Comment: 6 pages, 3 figures. Contribution to the Proceedings, APS-Division of Particles and Fields Conference, DPF99, UCLA, Los Angeles, CA, Jan. 5-9, 199

    LiveCap: Real-time Human Performance Capture from Monocular Video

    Full text link
    We present the first real-time human performance capture approach that reconstructs dense, space-time coherent deforming geometry of entire humans in general everyday clothing from just a single RGB video. We propose a novel two-stage analysis-by-synthesis optimization whose formulation and implementation are designed for high performance. In the first stage, a skinned template model is jointly fitted to background subtracted input video, 2D and 3D skeleton joint positions found using a deep neural network, and a set of sparse facial landmark detections. In the second stage, dense non-rigid 3D deformations of skin and even loose apparel are captured based on a novel real-time capable algorithm for non-rigid tracking using dense photometric and silhouette constraints. Our novel energy formulation leverages automatically identified material regions on the template to model the differing non-rigid deformation behavior of skin and apparel. The two resulting non-linear optimization problems per-frame are solved with specially-tailored data-parallel Gauss-Newton solvers. In order to achieve real-time performance of over 25Hz, we design a pipelined parallel architecture using the CPU and two commodity GPUs. Our method is the first real-time monocular approach for full-body performance capture. Our method yields comparable accuracy with off-line performance capture techniques, while being orders of magnitude faster

    Non-rigid Reconstruction with a Single Moving RGB-D Camera

    Full text link
    We present a novel non-rigid reconstruction method using a moving RGB-D camera. Current approaches use only non-rigid part of the scene and completely ignore the rigid background. Non-rigid parts often lack sufficient geometric and photometric information for tracking large frame-to-frame motion. Our approach uses camera pose estimated from the rigid background for foreground tracking. This enables robust foreground tracking in situations where large frame-to-frame motion occurs. Moreover, we are proposing a multi-scale deformation graph which improves non-rigid tracking without compromising the quality of the reconstruction. We are also contributing a synthetic dataset which is made publically available for evaluating non-rigid reconstruction methods. The dataset provides frame-by-frame ground truth geometry of the scene, the camera trajectory, and masks for background foreground. Experimental results show that our approach is more robust in handling larger frame-to-frame motions and provides better reconstruction compared to state-of-the-art approaches.Comment: Accepted in International Conference on Pattern Recognition (ICPR 2018

    VNect: Real-time 3D Human Pose Estimation with a Single RGB Camera

    Full text link
    We present the first real-time method to capture the full global 3D skeletal pose of a human in a stable, temporally consistent manner using a single RGB camera. Our method combines a new convolutional neural network (CNN) based pose regressor with kinematic skeleton fitting. Our novel fully-convolutional pose formulation regresses 2D and 3D joint positions jointly in real time and does not require tightly cropped input frames. A real-time kinematic skeleton fitting method uses the CNN output to yield temporally stable 3D global pose reconstructions on the basis of a coherent kinematic skeleton. This makes our approach the first monocular RGB method usable in real-time applications such as 3D character control---thus far, the only monocular methods for such applications employed specialized RGB-D cameras. Our method's accuracy is quantitatively on par with the best offline 3D monocular RGB pose estimation methods. Our results are qualitatively comparable to, and sometimes better than, results from monocular RGB-D approaches, such as the Kinect. However, we show that our approach is more broadly applicable than RGB-D solutions, i.e. it works for outdoor scenes, community videos, and low quality commodity RGB cameras.Comment: Accepted to SIGGRAPH 201

    SurfelWarp: Efficient Non-Volumetric Single View Dynamic Reconstruction

    Full text link
    We contribute a dense SLAM system that takes a live stream of depth images as input and reconstructs non-rigid deforming scenes in real time, without templates or prior models. In contrast to existing approaches, we do not maintain any volumetric data structures, such as truncated signed distance function (TSDF) fields or deformation fields, which are performance and memory intensive. Our system works with a flat point (surfel) based representation of geometry, which can be directly acquired from commodity depth sensors. Standard graphics pipelines and general purpose GPU (GPGPU) computing are leveraged for all central operations: i.e., nearest neighbor maintenance, non-rigid deformation field estimation and fusion of depth measurements. Our pipeline inherently avoids expensive volumetric operations such as marching cubes, volumetric fusion and dense deformation field update, leading to significantly improved performance. Furthermore, the explicit and flexible surfel based geometry representation enables efficient tackling of topology changes and tracking failures, which makes our reconstructions consistent with updated depth observations. Our system allows robots to maintain a scene description with non-rigidly deformed objects that potentially enables interactions with dynamic working environments.Comment: RSS 2018. The video and source code are available on https://sites.google.com/view/surfelwarp/hom
    • …
    corecore