955 research outputs found

    Realistic facial expression reconstruction for VR HMD users

    Get PDF

    MONOCULAR POSE ESTIMATION AND SHAPE RECONSTRUCTION OF QUASI-ARTICULATED OBJECTS WITH CONSUMER DEPTH CAMERA

    Get PDF
    Quasi-articulated objects, such as human beings, are among the most commonly seen objects in our daily lives. Extensive research have been dedicated to 3D shape reconstruction and motion analysis for this type of objects for decades. A major motivation is their wide applications, such as in entertainment, surveillance and health care. Most of existing studies relied on one or more regular video cameras. In recent years, commodity depth sensors have become more and more widely available. The geometric measurements delivered by the depth sensors provide significantly valuable information for these tasks. In this dissertation, we propose three algorithms for monocular pose estimation and shape reconstruction of quasi-articulated objects using a single commodity depth sensor. These three algorithms achieve shape reconstruction with increasing levels of granularity and personalization. We then further develop a method for highly detailed shape reconstruction based on our pose estimation techniques. Our first algorithm takes advantage of a motion database acquired with an active marker-based motion capture system. This method combines pose detection through nearest neighbor search with pose refinement via non-rigid point cloud registration. It is capable of accommodating different body sizes and achieves more than twice higher accuracy compared to a previous state of the art on a publicly available dataset. The above algorithm performs frame by frame estimation and therefore is less prone to tracking failure. Nonetheless, it does not guarantee temporal consistent of the both the skeletal structure and the shape and could be problematic for some applications. To address this problem, we develop a real-time model-based approach for quasi-articulated pose and 3D shape estimation based on Iterative Closest Point (ICP) principal with several novel constraints that are critical for monocular scenario. In this algorithm, we further propose a novel method for automatic body size estimation that enables its capability to accommodate different subjects. Due to the local search nature, the ICP-based method could be trapped to local minima in the case of some complex and fast motions. To address this issue, we explore the potential of using statistical model for soft point correspondences association. Towards this end, we propose a unified framework based on Gaussian Mixture Model for joint pose and shape estimation of quasi-articulated objects. This method achieves state-of-the-art performance on various publicly available datasets. Based on our pose estimation techniques, we then develop a novel framework that achieves highly detailed shape reconstruction by only requiring the user to move naturally in front of a single depth sensor. Our experiments demonstrate reconstructed shapes with rich geometric details for various subjects with different apparels. Last but not the least, we explore the applicability of our method on two real-world applications. First of all, we combine our ICP-base method with cloth simulation techniques for Virtual Try-on. Our system delivers the first promising 3D-based virtual clothing system. Secondly, we explore the possibility to extend our pose estimation algorithms to assist physical therapist to identify their patients’ movement dysfunctions that are related to injuries. Our preliminary experiments have demonstrated promising results by comparison with the gold standard active marker-based commercial system. Throughout the dissertation, we develop various state-of-the-art algorithms for pose estimation and shape reconstruction of quasi-articulated objects by leveraging the geometric information from depth sensors. We also demonstrate their great potentials for different real-world applications

    NASA: Neural Articulated Shape Approximation

    Full text link
    Efficient representation of articulated objects such as human bodies is an important problem in computer vision and graphics. To efficiently simulate deformation, existing approaches represent 3D objects using polygonal meshes and deform them using skinning techniques. This paper introduces neural articulated shape approximation (NASA), an alternative framework that enables efficient representation of articulated deformable objects using neural indicator functions that are conditioned on pose. Occupancy testing using NASA is straightforward, circumventing the complexity of meshes and the issue of water-tightness. We demonstrate the effectiveness of NASA for 3D tracking applications, and discuss other potential extensions.Comment: ECCV 202

    TT-SDF2PC: Registration of Point Cloud and Compressed SDF Directly in the Memory-Efficient Tensor Train Domain

    Full text link
    This paper addresses the following research question: ``can one compress a detailed 3D representation and use it directly for point cloud registration?''. Map compression of the scene can be achieved by the tensor train (TT) decomposition of the signed distance function (SDF) representation. It regulates the amount of data reduced by the so-called TT-ranks. Using this representation we have proposed an algorithm, the TT-SDF2PC, that is capable of directly registering a PC to the compressed SDF by making use of efficient calculations of its derivatives in the TT domain, saving computations and memory. We compare TT-SDF2PC with SOTA local and global registration methods in a synthetic dataset and a real dataset and show on par performance while requiring significantly less resources

    Articulation estimation and real-time tracking of human hand motions

    Get PDF
    Schröder M. Articulation estimation and real-time tracking of human hand motions. Bielefeld: Universität Bielefeld; 2015.This thesis deals with the problem of estimating and tracking the full articulation of human hands. Algorithmically recovering hand articulations is a challenging problem due to the hand’s high number of degrees of freedom and the complexity of its motions. Besides the accuracy and efficiency of the hand posture estimation, hand tracking methods are faced with issues such as invasiveness, ease of deployment and sensor artifacts. In this thesis several different hand tracking approaches are examined, including marker-based optical motion capture, data-driven discriminative visual tracking and generative tracking based on articulated registration, and various contributions to these areas are presented. The problem of optimally placing reduced marker sets on a performer’s hand for optical hand motion capture is explored. A method is proposed that automatically generates functional reduced marker layouts by optimizing for their numerical stability and geometric feasibility. A data-driven discriminative tracking approach based on matching the hand’s appearance in the sensor data with an image database is investigated. In addition to an efficient nearest neighbor search for images, a combination of discriminative initialization and generative refinement is employed. The method’s applicability is demonstrated in interactive robot teleoperation. Various real human hand motions are captured and statistically analyzed to derive low-dimensional representations of hand articulations. An adaptive hand posture subspace concept is developed and integrated into a generative real-time hand tracking approach that aligns a virtual hand model with sensor point clouds based on constrained inverse kinematics. Generative hand tracking is formulated as a regularized articulated registration process, in which geometrical model fitting is combined with statistical, kinematic and temporal regularization priors. A registration concept that combines 2D and 3D alignment and explicitly accounts for occlusions and visibility constraints is devised. High-quality, non-invasive, real-time hand tracking is achieved based on this regularized articulated registration formulation

    Long-Term Simultaneous Localization and Mapping in Dynamic Environments.

    Full text link
    One of the core competencies required for autonomous mobile robotics is the ability to use sensors to perceive the environment. From this noisy sensor data, the robot must build a representation of the environment and localize itself within this representation. This process, known as simultaneous localization and mapping (SLAM), is a prerequisite for almost all higher-level autonomous behavior in mobile robotics. By associating the robot's sensory observations as it moves through the environment, and by observing the robot's ego-motion through proprioceptive sensors, constraints are placed on the trajectory of the robot and the configuration of the environment. This results in a probabilistic optimization problem to find the most likely robot trajectory and environment configuration given all of the robot's previous sensory experience. SLAM has been well studied under the assumptions that the robot operates for a relatively short time period and that the environment is essentially static during operation. However, performing SLAM over long time periods while modeling the dynamic changes in the environment remains a challenge. The goal of this thesis is to extend the capabilities of SLAM to enable long-term autonomous operation in dynamic environments. The contribution of this thesis has three main components: First, we propose a framework for controlling the computational complexity of the SLAM optimization problem so that it does not grow unbounded with exploration time. Second, we present a method to learn visual feature descriptors that are more robust to changes in lighting, allowing for improved data association in dynamic environments. Finally, we use the proposed tools in SLAM systems that explicitly models the dynamics of the environment in the map by representing each location as a set of example views that capture how the location changes with time. We experimentally demonstrate that the proposed methods enable long-term SLAM in dynamic environments using a large, real-world vision and LIDAR dataset collected over the course of more than a year. This dataset captures a wide variety of dynamics: from short-term scene changes including moving people, cars, changing lighting, and weather conditions; to long-term dynamics including seasonal conditions and structural changes caused by construction.PhDElectrical Engineering: SystemsUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttp://deepblue.lib.umich.edu/bitstream/2027.42/111538/1/carlevar_1.pd
    • …
    corecore