183,864 research outputs found

    VAE/WGAN-Based Image Representation Learning For Pose-Preserving Seamless Identity Replacement In Facial Images

    Full text link
    We present a novel variational generative adversarial network (VGAN) based on Wasserstein loss to learn a latent representation from a face image that is invariant to identity but preserves head-pose information. This facilitates synthesis of a realistic face image with the same head pose as a given input image, but with a different identity. One application of this network is in privacy-sensitive scenarios; after identity replacement in an image, utility, such as head pose, can still be recovered. Extensive experimental validation on synthetic and real human-face image datasets performed under 3 threat scenarios confirms the ability of the proposed network to preserve head pose of the input image, mask the input identity, and synthesize a good-quality realistic face image of a desired identity. We also show that our network can be used to perform pose-preserving identity morphing and identity-preserving pose morphing. The proposed method improves over a recent state-of-the-art method in terms of quantitative metrics as well as synthesized image quality.Comment: 6 pages, 5 figures, 2019 IEEE 29th International Workshop on Machine Learning for Signal Processing (MLSP

    Photo-realistic face synthesis and reenactment with deep generative models

    Get PDF
    The advent of Deep Learning has led to numerous breakthroughs in the field of Computer Vision. Over the last decade, a significant amount of research has been undertaken towards designing neural networks for visual data analysis. At the same time, rapid advancements have been made towards the direction of deep generative modeling, especially after the introduction of Generative Adversarial Networks (GANs), which have shown particularly promising results when it comes to synthesising visual data. Since then, considerable attention has been devoted to the problem of photo-realistic human face animation due to its wide range of applications, including image and video editing, virtual assistance, social media, teleconferencing, and augmented reality. The objective of this thesis is to make progress towards generating photo-realistic videos of human faces. To that end, we propose novel generative algorithms that provide explicit control over the facial expression and head pose of synthesised subjects. Despite the major advances in face reenactment and motion transfer, current methods struggle to generate video portraits that are indistinguishable from real data. In this work, we aim to overcome the limitations of existing approaches, by combining concepts from deep generative networks and video-to-video translation with 3D face modelling, and more specifically by capitalising on prior knowledge of faces that is enclosed within statistical models such as 3D Morphable Models (3DMMs). In the first part of this thesis, we introduce a person-specific system that performs full head reenactment using ideas from video-to-video translation. Subsequently, we propose a novel approach to controllable video portrait synthesis, inspired from Implicit Neural Representations (INR). In the second part of the thesis, we focus on person-agnostic methods and present a GAN-based framework that performs video portrait reconstruction, full head reenactment, expression editing, novel pose synthesis and face frontalisation.Open Acces

    Diffused Heads: Diffusion Models Beat GANs on Talking-Face Generation

    Full text link
    Talking face generation has historically struggled to produce head movements and natural facial expressions without guidance from additional reference videos. Recent developments in diffusion-based generative models allow for more realistic and stable data synthesis and their performance on image and video generation has surpassed that of other generative models. In this work, we present an autoregressive diffusion model that requires only one identity image and audio sequence to generate a video of a realistic talking human head. Our solution is capable of hallucinating head movements, facial expressions, such as blinks, and preserving a given background. We evaluate our model on two different datasets, achieving state-of-the-art results on both of them

    Unsupervised face synthesis based on human traits

    Get PDF
    This paper presents a strategy to synthesize face images based on human traits. Specifically, the strategy allows synthesizing face images with similar age, gender, and ethnicity, after discovering groups of people with similar facial features. Our synthesizer is based on unsupervised learning and is capable to generate realistic faces. Our experiments reveal that grouping the training samples according to their similarity can lead to more realistic face images while having semantic control over the synthesis. The proposed strategy achieves competitive performance compared to the state-of-the-art and outperforms the baseline in terms of the Frechet Inception Distance

    Pose Guided Human Image Synthesis with Partially Decoupled GAN

    Full text link
    Pose Guided Human Image Synthesis (PGHIS) is a challenging task of transforming a human image from the reference pose to a target pose while preserving its style. Most existing methods encode the texture of the whole reference human image into a latent space, and then utilize a decoder to synthesize the image texture of the target pose. However, it is difficult to recover the detailed texture of the whole human image. To alleviate this problem, we propose a method by decoupling the human body into several parts (\eg, hair, face, hands, feet, \etc) and then using each of these parts to guide the synthesis of a realistic image of the person, which preserves the detailed information of the generated images. In addition, we design a multi-head attention-based module for PGHIS. Because most convolutional neural network-based methods have difficulty in modeling long-range dependency due to the convolutional operation, the long-range modeling capability of attention mechanism is more suitable than convolutional neural networks for pose transfer task, especially for sharp pose deformation. Extensive experiments on Market-1501 and DeepFashion datasets reveal that our method almost outperforms other existing state-of-the-art methods in terms of both qualitative and quantitative metrics.Comment: 16 pages, 14th Asian Conference on Machine Learning conferenc

    Face Image Modality Recognition and Photo-Sketch Matching

    Get PDF
    Face is an important physical characteristic of human body, and is widely used in many crucial applications, such as video surveillance, criminal investigation, and security access system. Based on realistic demand, such as useful face images in dark environment and criminal profile, different modalities of face images appeared, e.g. three-dimensional (3D), near infrared (NIR), and thermal infrared (TIR) face images. Thus, researches with various face image modalities become a hot area. Most of them are set on knowing the modality of face images in advance, which contains a few limitations. In this thesis, we present approaches for face image modality recognition to extend the possibility of cross-modality researches as well as handle new modality-mixed face images. Furthermore, a large facial image database is assembled with five commonly used modalities such as 3D, NIR, TIR, sketch, and visible light spectrum (VIS). Based on the analysis of results, a feature descriptor based on convolutional neural network with linear kernel SVM did an optimal performance.;As we mentioned above, face images are widely used in crucial applications, and one of them is using the sketch of suspect\u27s face, which based on the witness\u27 description, to assist law enforcement. Since it is difficult to capture face photos of the suspect during a criminal activity, automatic retrieving photos based on the suspect\u27s facial sketch is used for locating potential suspects. In this thesis, we perform photo-sketch matching by synthesizing the corresponding pseudo sketch from a given photo. There are three methods applied in this thesis, which are respectively based on style transfer, DualGAN, and cycle-consistent adversarial networks. Among the results of these methods, style transfer based method did a poor performance in photo-sketch matching, since it is an unsupervised one which is not purposeful in photo to sketch synthesis problem while the others need to train pointed models in synthesis stage
    • …
    corecore