796 research outputs found

    Neural Face Editing with Intrinsic Image Disentangling

    Full text link
    Traditional face editing methods often require a number of sophisticated and task specific algorithms to be applied one after the other --- a process that is tedious, fragile, and computationally intensive. In this paper, we propose an end-to-end generative adversarial network that infers a face-specific disentangled representation of intrinsic face properties, including shape (i.e. normals), albedo, and lighting, and an alpha matte. We show that this network can be trained on "in-the-wild" images by incorporating an in-network physically-based image formation module and appropriate loss functions. Our disentangling latent representation allows for semantically relevant edits, where one aspect of facial appearance can be manipulated while keeping orthogonal properties fixed, and we demonstrate its use for a number of facial editing applications.Comment: CVPR 2017 ora

    CNN-based Real-time Dense Face Reconstruction with Inverse-rendered Photo-realistic Face Images

    Full text link
    With the powerfulness of convolution neural networks (CNN), CNN based face reconstruction has recently shown promising performance in reconstructing detailed face shape from 2D face images. The success of CNN-based methods relies on a large number of labeled data. The state-of-the-art synthesizes such data using a coarse morphable face model, which however has difficulty to generate detailed photo-realistic images of faces (with wrinkles). This paper presents a novel face data generation method. Specifically, we render a large number of photo-realistic face images with different attributes based on inverse rendering. Furthermore, we construct a fine-detailed face image dataset by transferring different scales of details from one image to another. We also construct a large number of video-type adjacent frame pairs by simulating the distribution of real video data. With these nicely constructed datasets, we propose a coarse-to-fine learning framework consisting of three convolutional networks. The networks are trained for real-time detailed 3D face reconstruction from monocular video as well as from a single image. Extensive experimental results demonstrate that our framework can produce high-quality reconstruction but with much less computation time compared to the state-of-the-art. Moreover, our method is robust to pose, expression and lighting due to the diversity of data.Comment: Accepted by IEEE Transactions on Pattern Analysis and Machine Intelligence, 201

    DiffusionRig: Learning Personalized Priors for Facial Appearance Editing

    Full text link
    We address the problem of learning person-specific facial priors from a small number (e.g., 20) of portrait photos of the same person. This enables us to edit this specific person's facial appearance, such as expression and lighting, while preserving their identity and high-frequency facial details. Key to our approach, which we dub DiffusionRig, is a diffusion model conditioned on, or "rigged by," crude 3D face models estimated from single in-the-wild images by an off-the-shelf estimator. On a high level, DiffusionRig learns to map simplistic renderings of 3D face models to realistic photos of a given person. Specifically, DiffusionRig is trained in two stages: It first learns generic facial priors from a large-scale face dataset and then person-specific priors from a small portrait photo collection of the person of interest. By learning the CGI-to-photo mapping with such personalized priors, DiffusionRig can "rig" the lighting, facial expression, head pose, etc. of a portrait photo, conditioned only on coarse 3D models while preserving this person's identity and other high-frequency characteristics. Qualitative and quantitative experiments show that DiffusionRig outperforms existing approaches in both identity preservation and photorealism. Please see the project website: https://diffusionrig.github.io for the supplemental material, video, code, and data.Comment: CVPR 2023. Project website: https://diffusionrig.github.i

    FaceLit: Neural 3D Relightable Faces

    Full text link
    We propose a generative framework, FaceLit, capable of generating a 3D face that can be rendered at various user-defined lighting conditions and views, learned purely from 2D images in-the-wild without any manual annotation. Unlike existing works that require careful capture setup or human labor, we rely on off-the-shelf pose and illumination estimators. With these estimates, we incorporate the Phong reflectance model in the neural volume rendering framework. Our model learns to generate shape and material properties of a face such that, when rendered according to the natural statistics of pose and illumination, produces photorealistic face images with multiview 3D and illumination consistency. Our method enables photorealistic generation of faces with explicit illumination and view controls on multiple datasets - FFHQ, MetFaces and CelebA-HQ. We show state-of-the-art photorealism among 3D aware GANs on FFHQ dataset achieving an FID score of 3.5.Comment: CVPR 202

    Retrieval and Clustering from a 3D Human Database based on Body and Head Shape

    Full text link
    In this paper, we describe a framework for similarity based retrieval and clustering from a 3D human database. Our technique is based on both body and head shape representation and the retrieval is based on similarity of both of them. The 3D human database used in our study is the CAESAR anthropometric database which contains approximately 5000 bodies. We have developed a web-based interface for specifying the queries to interact with the retrieval system. Our approach performs the similarity based retrieval in a reasonable amount of time and is a practical approach.Comment: Published in Proceedings of the 2006 Digital Human Modeling for Design and Engineering Conference, July 2006, Lyon, FRANCE, Session: Advanced Size/Shape Analysis Paper Number: 2006-01-2355 http://papers.sae.org/2006-01-235

    Multilinear methods for disentangling variations with applications to facial analysis

    Get PDF
    Several factors contribute to the appearance of an object in a visual scene, including pose, illumination, and deformation, among others. Each factor accounts for a source of variability in the data. It is assumed that the multiplicative interactions of these factors emulate the entangled variability, giving rise to the rich structure of visual object appearance. Disentangling such unobserved factors from visual data is a challenging task, especially when the data have been captured in uncontrolled recording conditions (also referred to as “in-the-wild”) and label information is not available. The work presented in this thesis focuses on disentangling the variations contained in visual data, in particular applied to 2D and 3D faces. The motivation behind this work lies in recent developments in the field, such as (i) the creation of large, visual databases for face analysis, with (ii) the need of extracting information without the use of labels and (iii) the need to deploy systems under demanding, real-world conditions. In the first part of this thesis, we present a method to synthesise plausible 3D expressions that preserve the identity of a target subject. This method is supervised as the model uses labels, in this case 3D facial meshes of people performing a defined set of facial expressions, to learn. The ability to synthesise an entire facial rig from a single neutral expression has a large range of applications both in computer graphics and computer vision, ranging from the ecient and cost-e↵ective creation of CG characters to scalable data generation for machine learning purposes. Unlike previous methods based on multilinear models, the proposed approach is capable to extrapolate well outside the sample pool, which allows it to accurately reproduce the identity of the target subject and create artefact-free expression shapes while requiring only a small input dataset. We introduce global-local multilinear models that leverage the strengths of expression-specific and identity-specific local models combined with coarse motion estimations from a global model. The expression-specific and identity-specific local models are built from di↵erent slices of the patch-wise local multilinear model. Experimental results show that we achieve high-quality, identity-preserving facial expression synthesis results that outperform existing methods both quantitatively and qualitatively. In the second part of this thesis, we investigate how the modes of variations from visual data can be extracted. Our assumption is that visual data has an underlying structure consisting of factors of variation and their interactions. Finding this structure and the factors is important as it would not only help us to better understand visual data but once obtained we can edit the factors for use in various applications. Shape from Shading and expression transfer are just two of the potential applications. To extract the factors of variation, several supervised methods have been proposed but they require both labels regarding the modes of variations and the same number of samples under all modes of variations. Therefore, their applicability is limited to well-organised data, usually captured in well-controlled conditions. We propose a novel general multilinear matrix decomposition method that discovers the multilinear structure of possibly incomplete sets of visual data in unsupervised setting. We demonstrate the applicability of the proposed method in several computer vision tasks, including Shape from Shading (SfS) (in the wild and with occlusion removal), expression transfer, and estimation of surface normals from images captured in the wild. Finally, leveraging the unsupervised multilinear method proposed as well as recent advances in deep learning, we propose a weakly supervised deep learning method for disentangling multiple latent factors of variation in face images captured in-the-wild. To this end, we propose a deep latent variable model, where we model the multiplicative interactions of multiple latent factors of variation explicitly as a multilinear structure. We demonstrate that the proposed approach indeed learns disentangled representations of facial expressions and pose, which can be used in various applications, including face editing, as well as 3D face reconstruction and classification of facial expression, identity and pose.Open Acces

    Drivable 3D Gaussian Avatars

    Full text link
    We present Drivable 3D Gaussian Avatars (D3GA), the first 3D controllable model for human bodies rendered with Gaussian splats. Current photorealistic drivable avatars require either accurate 3D registrations during training, dense input images during testing, or both. The ones based on neural radiance fields also tend to be prohibitively slow for telepresence applications. This work uses the recently presented 3D Gaussian Splatting (3DGS) technique to render realistic humans at real-time framerates, using dense calibrated multi-view videos as input. To deform those primitives, we depart from the commonly used point deformation method of linear blend skinning (LBS) and use a classic volumetric deformation method: cage deformations. Given their smaller size, we drive these deformations with joint angles and keypoints, which are more suitable for communication applications. Our experiments on nine subjects with varied body shapes, clothes, and motions obtain higher-quality results than state-of-the-art methods when using the same training and test data.Comment: Website: https://zielon.github.io/d3ga
    • …
    corecore