9 research outputs found

    Fast Nonlinear Least Squares Optimization of Large-Scale Semi-Sparse Problems

    Get PDF
    Many problems in computer graphics and vision can be formulated as a nonlinear least squares optimization problem, for which numerous off-the-shelf solvers are readily available. Depending on the structure of the problem, however, existing solvers may be more or less suitable, and in some cases the solution comes at the cost of lengthy convergence times. One such case is semi-sparse optimization problems, emerging for example in localized facial performance reconstruction, where the nonlinear least squares problem can be composed of hundreds of thousands of cost functions, each one involving many of the optimization parameters. While such problems can be solved with existing solvers, the computation time can severely hinder the applicability of these methods. We introduce a novel iterative solver for nonlinear least squares optimization of large-scale semi-sparse problems. We use the nonlinear Levenberg-Marquardt method to locally linearize the problem in parallel, based on its firstorder approximation. Then, we decompose the linear problem in small blocks, using the local Schur complement, leading to a more compact linear system without loss of information. The resulting system is dense but its size is small enough to be solved using a parallel direct method in a short amount of time. The main benefit we get by using such an approach is that the overall optimization process is entirely parallel and scalable, making it suitable to be mapped onto graphics hardware (GPU). By using our minimizer, results are obtained up to one order of magnitude faster than other existing solvers, without sacrificing the generality and the accuracy of the model. We provide a detailed analysis of our approach and validate our results with the application of performance-based facial capture using a recently-proposed anatomical local face deformation model

    A Perceptual Shape Loss for Monocular 3D Face Reconstruction

    Full text link
    Monocular 3D face reconstruction is a wide-spread topic, and existing approaches tackle the problem either through fast neural network inference or offline iterative reconstruction of face geometry. In either case carefully-designed energy functions are minimized, commonly including loss terms like a photometric loss, a landmark reprojection loss, and others. In this work we propose a new loss function for monocular face capture, inspired by how humans would perceive the quality of a 3D face reconstruction given a particular image. It is widely known that shading provides a strong indicator for 3D shape in the human visual system. As such, our new 'perceptual' shape loss aims to judge the quality of a 3D face estimate using only shading cues. Our loss is implemented as a discriminator-style neural network that takes an input face image and a shaded render of the geometry estimate, and then predicts a score that perceptually evaluates how well the shaded render matches the given image. This 'critic' network operates on the RGB image and geometry render alone, without requiring an estimate of the albedo or illumination in the scene. Furthermore, our loss operates entirely in image space and is thus agnostic to mesh topology. We show how our new perceptual shape loss can be combined with traditional energy terms for monocular 3D face optimization and deep neural network regression, improving upon current state-of-the-art results.Comment: Accepted to PG 2023. Project page: https://studios.disneyresearch.com/2023/10/09/a-perceptual-shape-loss-for-monocular-3d-face-reconstruction/ Video: https://www.youtube.com/watch?v=RYdyoIZEuU

    Interactive Sculpting of Digital Faces Using an Anatomical Modeling Paradigm

    Get PDF
    Digitally sculpting 3D human faces is a very challenging task. It typically requires either 1) highly-skilled artists using complex software packages for high quality results, or 2) highly-constrained simple interfaces for consumer-level avatar creation, such as in game engines. We propose a novel interactive method for the creation of digital faces that is simple and intuitive to use, even for novice users, while consistently producing plausible 3D face geometry, and allowing editing freedom beyond traditional video game avatar creation. At the core of our system lies a specialized anatomical local face model (ALM), which is constructed from a dataset of several hundred 3D face scans. User edits are propagated to constraints for an optimization of our data-driven ALM model, ensuring the resulting face remains plausible even for simple edits like clicking and dragging surface points. We show how several natural interaction methods can be implemented in our framework, including direct control of the surface, indirect control of semantic features like age, ethnicity, gender, and BMI, as well as indirect control through manipulating the underlying bony structures. The result is a simple new method for creating digital human faces, for artists and novice users alike. Our method is attractive for low-budget VFX and animation productions, and our anatomical modeling paradigm can complement traditional game engine avatar design packages

    Data-Driven Face Analysis for Performance Retargeting

    No full text
    The democratization of digital humans in entertainment was made possible by the recent advances in performance capture, rendering and animation techniques. The human face, which is key to realism, is very complex to animate by hand and facial performance capture is nowadays often used to acquire a starting point for the animation. Most of the time however, captured actors are not re-rendered directly on screen, but their performance is retargeted to other characters or fantasy creatures. The task of retargeting facial performances brings forth multiple challenging questions, such as how does one map the performance of an actor to another, how should the data be represented to optimally do so, and how does one maintain artistic control while doing so, to only cite a few. These challenges make facial performance retargeting an active and exciting area of research. In this dissertation, we present several contributions towards solving the retargeting problem. We first introduce a novel jaw rig, designed using ground truth jaw motion data acquired with a novel capture method specifically designed for this task. Our jaw rig allows for direct and indirect controls while restricting the motion of the mandible to only physiologically possible poses. We use a well-known concept from dentistry, the Posselt envelope of motion, to parameterize its controls. Finally, we show how this jaw rig can be retargeted to unseen actors or creatures. Our second contribution is a novel markerless method to accurately track the underlying jaw bone. We use our jaw motion capture method to capture a dataset of ground truth jaw motion and geometry and learn a non-linear mapping between the facial skin deformation and the motion of the underlying bone. We also demonstrate how this method can be used on actors for which no ground truth jaw motion is acquired, outperforming the currently used techniques. In most of the modern performance capture methods, the facial geometry will inevitably contain parasitic dynamic motion which are, most of the time, undesired. This is specially true in the context of performance retargeting. Our third contribution aims to compute and characterize the difference between the captured dynamic facial performance, and a speculative quasistatic variant of the same motion, should the inertial effects have been absent. We show how our method can be used to remove secondary dynamics from a captured performance and synthesize novel dynamics, given novel head motion. Our last contribution tackles a different kind of retargeting problem; the problem of re-aging of facial performances in image space. In contrast to existing method, we specifically tackle the problem of high-resolution temporally stable re-aging. We show how a synthetic dataset can be computed using a state-of-the-art generative adversarial network and used to train our re-aging network. Our method allows fine-grained continuous age control and intuitive artistic effects such as localized control. We believe the methods presented in this thesis will solve or alleviate some of the problems in modern performance retargeting and will inspire exciting future work

    Adaptive Convolutions for Structure-Aware Style Transfer

    No full text
    Style transfer between images is an artistic application of CNNs, where the 'style' of one image is transferred onto another image while preserving the latter's content. The state of the art in neural style transfer is based on Adaptive Instance Normalization (AdaIN), a technique that transfers the statistical properties of style features to a content image, and can transfer a large number of styles in real time. However, AdaIN is a global operation; thus local geometric structures in the style image are often ignored during the transfer. We propose Adaptive Convolutions (AdaConv), a generic extension of AdaIN, to allow for the simultaneous transfer of both statistical and structural styles in real time. Apart from style transfer, our method can also be readily extended to style-based image generation, and other tasks where AdaIN has already been adopted

    Implicit neural representation for physics-driven actuated soft bodies

    No full text
    Active soft bodies can affect their shape through an internal actuation mechanism that induces a deformation. Similar to recent work, this paper utilizes a differentiable, quasi-static, and physics-based simulation layer to optimize for actuation signals parameterized by neural networks. Our key contribution is a general and implicit formulation to control active soft bodies by defining a function that enables a continuous mapping from a spatial point in the material space to the actuation value. This property allows us to capture the signal's dominant frequencies, making the method discretization agnostic and widely applicable. We extend our implicit model to mandible kinematics for the particular case of facial animation and show that we can reliably reproduce facial expressions captured with high-quality capture systems. We apply the method to volumetric soft bodies, human poses, and facial expressions, demonstrating artist-friendly properties, such as simple control over the latent space and resolution invariance at test time.ISSN:0730-0301ISSN:1557-736

    Improved Lighting Models for Facial Appearance Capture

    No full text
    Facial appearance capture techniques estimate geometry and reflectance properties of facial skin by performing a computationally intensive inverse rendering optimization in which one or more images are re-rendered a large number of times and compared to real images coming from multiple cameras. Due to the high computational burden, these techniques often make several simplifying assumptions to tame complexity and make the problem more tractable. For example, it is common to assume that the scene consists of only distant light sources, and ignore indirect bounces of light (on the surface and within the surface). Also, methods based on polarized lighting often simplify the light interaction with the surface and assume perfect separation of diffuse and specular reflectance. In this paper, we move in the opposite direction and demonstrate the impact on facial appearance capture quality when departing from these idealized conditions towards models that seek to more accurately represent the lighting, while at the same time minimally increasing computational burden. We compare the results obtained with a state-of-the-art appearance capture method [RGB*20], with and without our proposed improvements to the lighting model
    corecore