843 research outputs found

    HeadOn: Real-time Reenactment of Human Portrait Videos

    Get PDF
    We propose HeadOn, the first real-time source-to-target reenactment approach for complete human portrait videos that enables transfer of torso and head motion, face expression, and eye gaze. Given a short RGB-D video of the target actor, we automatically construct a personalized geometry proxy that embeds a parametric head, eye, and kinematic torso model. A novel real-time reenactment algorithm employs this proxy to photo-realistically map the captured motion from the source actor to the target actor. On top of the coarse geometric proxy, we propose a video-based rendering technique that composites the modified target portrait video via view- and pose-dependent texturing, and creates photo-realistic imagery of the target actor under novel torso and head poses, facial expressions, and gaze directions. To this end, we propose a robust tracking of the face and torso of the source actor. We extensively evaluate our approach and show significant improvements in enabling much greater flexibility in creating realistic reenacted output videos.Comment: Video: https://www.youtube.com/watch?v=7Dg49wv2c_g Presented at Siggraph'1

    Keyframe-based monocular SLAM: design, survey, and future directions

    Get PDF
    Extensive research in the field of monocular SLAM for the past fifteen years has yielded workable systems that found their way into various applications in robotics and augmented reality. Although filter-based monocular SLAM systems were common at some time, the more efficient keyframe-based solutions are becoming the de facto methodology for building a monocular SLAM system. The objective of this paper is threefold: first, the paper serves as a guideline for people seeking to design their own monocular SLAM according to specific environmental constraints. Second, it presents a survey that covers the various keyframe-based monocular SLAM systems in the literature, detailing the components of their implementation, and critically assessing the specific strategies made in each proposed solution. Third, the paper provides insight into the direction of future research in this field, to address the major limitations still facing monocular SLAM; namely, in the issues of illumination changes, initialization, highly dynamic motion, poorly textured scenes, repetitive textures, map maintenance, and failure recovery

    Temporally coherent 4D reconstruction of complex dynamic scenes

    Get PDF
    This paper presents an approach for reconstruction of 4D temporally coherent models of complex dynamic scenes. No prior knowledge is required of scene structure or camera calibration allowing reconstruction from multiple moving cameras. Sparse-to-dense temporal correspondence is integrated with joint multi-view segmentation and reconstruction to obtain a complete 4D representation of static and dynamic objects. Temporal coherence is exploited to overcome visual ambiguities resulting in improved reconstruction of complex scenes. Robust joint segmentation and reconstruction of dynamic objects is achieved by introducing a geodesic star convexity constraint. Comparative evaluation is performed on a variety of unstructured indoor and outdoor dynamic scenes with hand-held cameras and multiple people. This demonstrates reconstruction of complete temporally coherent 4D scene models with improved nonrigid object segmentation and shape reconstruction.Comment: To appear in The IEEE Conference on Computer Vision and Pattern Recognition (CVPR) 2016 . Video available at: https://www.youtube.com/watch?v=bm_P13_-Ds

    IST Austria Thesis

    Get PDF
    Computer graphics is an extremely exciting field for two reasons. On the one hand, there is a healthy injection of pragmatism coming from the visual effects industry that want robust algorithms that work so they can produce results at an increasingly frantic pace. On the other hand, they must always try to push the envelope and achieve the impossible to wow their audiences in the next blockbuster, which means that the industry has not succumb to conservatism, and there is plenty of room to try out new and crazy ideas if there is a chance that it will pan into something useful. Water simulation has been in visual effects for decades, however it still remains extremely challenging because of its high computational cost and difficult artdirectability. The work in this thesis tries to address some of these difficulties. Specifically, we make the following three novel contributions to the state-of-the-art in water simulation for visual effects. First, we develop the first algorithm that can convert any sequence of closed surfaces in time into a moving triangle mesh. State-of-the-art methods at the time could only handle surfaces with fixed connectivity, but we are the first to be able to handle surfaces that merge and split apart. This is important for water simulation practitioners, because it allows them to convert splashy water surfaces extracted from particles or simulated using grid-based level sets into triangle meshes that can be either textured and enhanced with extra surface dynamics as a post-process. We also apply our algorithm to other phenomena that merge and split apart, such as morphs and noisy reconstructions of human performances. Second, we formulate a surface-based energy that measures the deviation of a water surface froma physically valid state. Such discrepancies arise when there is a mismatch in the degrees of freedom between the water surface and the underlying physics solver. This commonly happens when practitioners use a moving triangle mesh with a grid-based physics solver, or when high-resolution grid-based surfaces are combined with low-resolution physics. Following the direction of steepest descent on our surface-based energy, we can either smooth these artifacts or turn them into high-resolution waves by interpreting the energy as a physical potential. Third, we extend state-of-the-art techniques in non-reflecting boundaries to handle spatially and time-varying background flows. This allows a novel new workflow where practitioners can re-simulate part of an existing simulation, such as removing a solid obstacle, adding a new splash or locally changing the resolution. Such changes can easily lead to new waves in the re-simulated region that would reflect off of the new simulation boundary, effectively ruining the illusion of a seamless simulation boundary between the existing and new simulations. Our non-reflecting boundaries makes sure that such waves are absorbed

    MONOCULAR POSE ESTIMATION AND SHAPE RECONSTRUCTION OF QUASI-ARTICULATED OBJECTS WITH CONSUMER DEPTH CAMERA

    Get PDF
    Quasi-articulated objects, such as human beings, are among the most commonly seen objects in our daily lives. Extensive research have been dedicated to 3D shape reconstruction and motion analysis for this type of objects for decades. A major motivation is their wide applications, such as in entertainment, surveillance and health care. Most of existing studies relied on one or more regular video cameras. In recent years, commodity depth sensors have become more and more widely available. The geometric measurements delivered by the depth sensors provide significantly valuable information for these tasks. In this dissertation, we propose three algorithms for monocular pose estimation and shape reconstruction of quasi-articulated objects using a single commodity depth sensor. These three algorithms achieve shape reconstruction with increasing levels of granularity and personalization. We then further develop a method for highly detailed shape reconstruction based on our pose estimation techniques. Our first algorithm takes advantage of a motion database acquired with an active marker-based motion capture system. This method combines pose detection through nearest neighbor search with pose refinement via non-rigid point cloud registration. It is capable of accommodating different body sizes and achieves more than twice higher accuracy compared to a previous state of the art on a publicly available dataset. The above algorithm performs frame by frame estimation and therefore is less prone to tracking failure. Nonetheless, it does not guarantee temporal consistent of the both the skeletal structure and the shape and could be problematic for some applications. To address this problem, we develop a real-time model-based approach for quasi-articulated pose and 3D shape estimation based on Iterative Closest Point (ICP) principal with several novel constraints that are critical for monocular scenario. In this algorithm, we further propose a novel method for automatic body size estimation that enables its capability to accommodate different subjects. Due to the local search nature, the ICP-based method could be trapped to local minima in the case of some complex and fast motions. To address this issue, we explore the potential of using statistical model for soft point correspondences association. Towards this end, we propose a unified framework based on Gaussian Mixture Model for joint pose and shape estimation of quasi-articulated objects. This method achieves state-of-the-art performance on various publicly available datasets. Based on our pose estimation techniques, we then develop a novel framework that achieves highly detailed shape reconstruction by only requiring the user to move naturally in front of a single depth sensor. Our experiments demonstrate reconstructed shapes with rich geometric details for various subjects with different apparels. Last but not the least, we explore the applicability of our method on two real-world applications. First of all, we combine our ICP-base method with cloth simulation techniques for Virtual Try-on. Our system delivers the first promising 3D-based virtual clothing system. Secondly, we explore the possibility to extend our pose estimation algorithms to assist physical therapist to identify their patients’ movement dysfunctions that are related to injuries. Our preliminary experiments have demonstrated promising results by comparison with the gold standard active marker-based commercial system. Throughout the dissertation, we develop various state-of-the-art algorithms for pose estimation and shape reconstruction of quasi-articulated objects by leveraging the geometric information from depth sensors. We also demonstrate their great potentials for different real-world applications

    A Bayesian Framework for Human Body Pose Tracking from Depth Image Sequences

    Get PDF
    This paper addresses the problem of accurate and robust tracking of 3D human body pose from depth image sequences. Recovering the large number of degrees of freedom in human body movements from a depth image sequence is challenging due to the need to resolve the depth ambiguity caused by self-occlusions and the difficulty to recover from tracking failure. Human body poses could be estimated through model fitting using dense correspondences between depth data and an articulated human model (local optimization method). Although it usually achieves a high accuracy due to dense correspondences, it may fail to recover from tracking failure. Alternately, human pose may be reconstructed by detecting and tracking human body anatomical landmarks (key-points) based on low-level depth image analysis. While this method (key-point based method) is robust and recovers from tracking failure, its pose estimation accuracy depends solely on image-based localization accuracy of key-points. To address these limitations, we present a flexible Bayesian framework for integrating pose estimation results obtained by methods based on key-points and local optimization. Experimental results are shown and performance comparison is presented to demonstrate the effectiveness of the proposed approach

    Object Tracking

    Get PDF
    Object tracking consists in estimation of trajectory of moving objects in the sequence of images. Automation of the computer object tracking is a difficult task. Dynamics of multiple parameters changes representing features and motion of the objects, and temporary partial or full occlusion of the tracked objects have to be considered. This monograph presents the development of object tracking algorithms, methods and systems. Both, state of the art of object tracking methods and also the new trends in research are described in this book. Fourteen chapters are split into two sections. Section 1 presents new theoretical ideas whereas Section 2 presents real-life applications. Despite the variety of topics contained in this monograph it constitutes a consisted knowledge in the field of computer object tracking. The intention of editor was to follow up the very quick progress in the developing of methods as well as extension of the application

    Visibility Constrained Generative Model for Depth-based 3D Facial Pose Tracking

    Full text link
    In this paper, we propose a generative framework that unifies depth-based 3D facial pose tracking and face model adaptation on-the-fly, in the unconstrained scenarios with heavy occlusions and arbitrary facial expression variations. Specifically, we introduce a statistical 3D morphable model that flexibly describes the distribution of points on the surface of the face model, with an efficient switchable online adaptation that gradually captures the identity of the tracked subject and rapidly constructs a suitable face model when the subject changes. Moreover, unlike prior art that employed ICP-based facial pose estimation, to improve robustness to occlusions, we propose a ray visibility constraint that regularizes the pose based on the face model's visibility with respect to the input point cloud. Ablation studies and experimental results on Biwi and ICT-3DHP datasets demonstrate that the proposed framework is effective and outperforms completing state-of-the-art depth-based methods
    corecore