26 research outputs found

    Generation of Virtual Humans for Virtual Reality, Medicine, and Domestic Assistance

    Get PDF
    Achenbach J. Generation of Virtual Humans for Virtual Reality, Medicine, and Domestic Assistance. Bielefeld: Universität Bielefeld; 2019.Virtual humans are employed in various applications including computer games, special effects in movies, virtual try-ons, medical surgery planning, and virtual assistance. This thesis deals with virtual humans and their computer-aided generation for different purposes. In a first step, we derive a technique to digitally clone the face of a scanned person. Fitting a facial template model to 3D-scanner data is a powerful technique for generating face avatars, in particular in the presence of noisy and incomplete measurements. Consequently, there are many approaches for the underlying non-rigid registration task, and these are typically composed from very similar algorithmic building blocks. By providing a thorough analysis of the different design choices, we derive a face matching technique tailored to high-quality reconstructions from high-resolution scanner data. We then extend this approach in two ways: An anisotropic bending model allows us to more accurately reconstruct facial details. A simultaneous constrained fitting of eyes and eyelids improves the reconstruction of the eye region considerably. Next, we extend this work to full bodies and present a complete pipeline to create animatable virtual humans by fitting a holistic template character. Due to the careful selection of techniques and technology, our reconstructed humans are quite realistic in terms of both geometry and texture. Since we represent our models as single-layer triangle meshes and animate them through standard skeleton-based skinning and facial blendshapes, our characters can be used in standard VR engines out of the box. By optimizing computation time and minimizing manual intervention, our reconstruction pipeline is capable of processing entire characters in less than ten minutes. In a following part of this thesis, we build on our template fitting method and deal with the problem of inferring the skin surface of a head from a given skull and vice versa. Starting with a method for automated estimation of a human face from a given skull remain, we extend this approach to bidirectional facial reconstruction in order to also estimate the skull from a given scan of the skin surface. This is based on a multilinear model that describes the correlation between the skull and the facial soft tissue thickness on the one hand and the head/face surface geometry on the other hand. We demonstrate the versatility of our novel multilinear model by estimating faces from given skulls as well as skulls from given faces within just a couple of seconds. To foster further research in this direction, we made our multilinear model publicly available. In a last part, we generate assistive virtual humans that are employed as stimuli for an interdisciplinary study. In the study, we shed light on user preferences for visual attributes of virtual assistants in a variety of smart home contexts

    {3D} Morphable Face Models -- Past, Present and Future

    No full text
    In this paper, we provide a detailed survey of 3D Morphable Face Models over the 20 years since they were first proposed. The challenges in building and applying these models, namely capture, modeling, image formation, and image analysis, are still active research topics, and we review the state-of-the-art in each of these areas. We also look ahead, identifying unsolved challenges, proposing directions for future research and highlighting the broad range of current and future applications

    Neural Volumetric Blendshapes: Computationally Efficient Physics-Based Facial Blendshapes

    Full text link
    Computationally weak systems and demanding graphical applications are still mostly dependent on linear blendshapes for facial animations. The accompanying artifacts such as self-intersections, loss of volume, or missing soft tissue elasticity can be avoided by using physics-based animation models. However, these are cumbersome to implement and require immense computational effort. We propose neural volumetric blendshapes, an approach that combines the advantages of physics-based simulations with realtime runtimes even on consumer-grade CPUs. To this end, we present a neural network that efficiently approximates the involved volumetric simulations and generalizes across human identities as well as facial expressions. Our approach can be used on top of any linear blendshape system and, hence, can be deployed straightforwardly. Furthermore, it only requires a single neutral face mesh as input in the minimal setting. Along with the design of the network, we introduce a pipeline for the challenging creation of anatomically and physically plausible training data. Part of the pipeline is a novel hybrid regressor that densely positions a skull within a skin surface while avoiding intersections. The fidelity of all parts of the data generation pipeline as well as the accuracy and efficiency of the network are evaluated in this work. Upon publication, the trained models and associated code will be released

    PhoMoH: Implicit Photorealistic 3D Models of Human Heads

    Full text link
    We present PhoMoH, a neural network methodology to construct generative models of photo-realistic 3D geometry and appearance of human heads including hair, beards, an oral cavity, and clothing. In contrast to prior work, PhoMoH models the human head using neural fields, thus supporting complex topology. Instead of learning a head model from scratch, we propose to augment an existing expressive head model with new features. Concretely, we learn a highly detailed geometry network layered on top of a mid-resolution head model together with a detailed, local geometry-aware, and disentangled color field. Our proposed architecture allows us to learn photo-realistic human head models from relatively little data. The learned generative geometry and appearance networks can be sampled individually and enable the creation of diverse and realistic human heads. Extensive experiments validate our method qualitatively and across different metrics.Comment: To be published at the International Conference on 3D Vision 202

    Synthesization and reconstruction of 3D faces by deep neural networks

    Get PDF
    The past few decades have witnessed substantial progress towards 3D facial modelling and reconstruction as it is high importance for many computer vision and graphics applications including Augmented/Virtual Reality (AR/VR), computer games, movie post-production, image/video editing, medical applications, etc. In the traditional approaches, facial texture and shape are represented as triangle mesh that can cover identity and expression variation with non-rigid deformation. A dataset of 3D face scans is then densely registered into a common topology in order to construct a linear statistical model. Such models are called 3D Morphable Models (3DMMs) and can be used for 3D face synthesization or reconstruction by a single or few 2D face images. The works presented in this thesis focus on the modernization of these traditional techniques in the light of recent advances of deep learning and thanks to the availability of large-scale datasets. Ever since the introduction of 3DMMs by over two decades, there has been a lot of progress on it and they are still considered as one of the best methodologies to model 3D faces. Nevertheless, there are still several aspects of it that need to be upgraded to the "deep era". Firstly, the conventional 3DMMs are built by linear statistical approaches such as Principal Component Analysis (PCA) which omits high-frequency information by its nature. While this does not curtail shape, which is often smooth in the original data, texture models are heavily afflicted by losing high-frequency details and photorealism. Secondly, the existing 3DMM fitting approaches rely on very primitive (i.e. RGB values, sparse landmarks) or hand-crafted features (i.e. HOG, SIFT) as supervision that are sensitive to "in-the-wild" images (i.e. lighting, pose, occlusion), or somewhat missing identity/expression resemblance with the target image. Finally, shape, texture, and expression modalities are separately modelled by ignoring the correlation among them, placing a fundamental limit to the synthesization of semantically meaningful 3D faces. Moreover, photorealistic 3D face synthesis has not been studied thoroughly in the literature. This thesis attempts to address the above-mentioned issues by harnessing the power of deep neural network and generative adversarial networks as explained below: Due to the linear texture models, many of the state-of-the-art methods are still not capable of reconstructing facial textures with high-frequency details. For this, we take a radically different approach and build a high-quality texture model by Generative Adversarial Networks (GANs) that preserves details. That is, we utilize GANs to train a very powerful generator of facial texture in the UV space. And then show that it is possible to employ this generator network as a statistical texture prior to 3DMM fitting. The resulting texture reconstructions are plausible and photorealistic as GANs are faithful to the real-data distribution in both low- and high- frequency domains. Then, we revisit the conventional 3DMM fitting approaches making use of non-linear optimization to find the optimal latent parameters that best reconstruct the test image but under a new perspective. We propose to optimize the parameters with the supervision of pretrained deep identity features through our end-to-end differentiable framework. In order to be robust towards initialization and expedite the fitting process, we also propose a novel self-supervised regression-based approach. We demonstrate excellent 3D face reconstructions that are photorealistic and identity preserving and achieve for the first time, to the best of our knowledge, facial texture reconstruction with high-frequency details. In order to extend the non-linear texture model for photo-realistic 3D face synthesis, we present a methodology that generates high-quality texture, shape, and normals jointly. To do so, we propose a novel GAN that can generate data from different modalities while exploiting their correlations. Furthermore, we demonstrate how we can condition the generation on the expression and create faces with various facial expressions. Additionally, we study another approach for photo-realistic face synthesis by 3D guidance. This study proposes to generate 3D faces by linear 3DMM and then augment their 2D rendering by an image-to-image translation network to the photorealistic face domain. Both works demonstrate excellent photorealistic face synthesis and show that the generated faces are improving face recognition benchmarks as synthetic training data. Finally, we study expression reconstruction for personalized 3D face models where we improve generalization and robustness of expression encoding. First, we propose a 3D augmentation approach on 2D head-mounted camera images to increase robustness to perspective changes. And, we also propose to train generic expression encoder network by populating the number of identities with a novel multi-id personalized model training architecture in a self-supervised manner. Both approaches show promising results in both qualitative and quantitative experiments.Open Acces

    Analysis of 2D and 3D images of the human head for shape, expression and gaze

    Get PDF
    Analysis of the full human head in the context of computer vision has been an ongoing research area for years. While the deep learning community has witnessed the trend of constructing end-to-end models that solve the problem in one pass, it is challenging to apply such a procedure to full human heads. This is because human heads are complicated and have numerous relatively small components with high-frequency details. For example, in a high-quality 3D scan of a full human head from the Headspace dataset, each ear part only occupies 1.5\% of the total vertices. A method that aims to reconstruct full 3D heads in an end-to-end manner is prone to ignoring the detail of ears. Therefore, this thesis focuses on the analysis of small components of the full human head individually but approaches each in an end-to-end training manner. The details of these three main contributions of the three individual parts are presented in three separate chapters. The first contribution aims at reconstructing the underlying 3D ear geometry and colour details given a monocular RGB image and uses the geometry information to initialise a model-fitting process that finds 55 3D ear landmarks on raw 3D head scans. The second contribution employs a similar pipeline but applies it to an eye-region and eyeball model. The work focuses on building a method that has the advantages of both the model-based approach and the appearance-based approach, resulting in an explicit model with state-of-the-art gaze prediction precision. The final work focuses on the separation of the facial identity and the facial expression via learning a disentangled representation. We design an autoencoder that extracts facial identity and facial expression representations separately. Finally, we overview our contributions and the prospects of the future applications that are enabled by them
    corecore