86 research outputs found

    Linear Facial Expression Transfer With Active Appearance Models

    Get PDF
    The issue of transferring facial expressions from one person's face to another's has been an area of interest for the movie industry and the computer graphics community for quite some time. In recent years, with the proliferation of online image and video collections and web applications, such as Google Street View, the question of preserving privacy through face de-identification has gained interest in the computer vision community. In this paper, we focus on the problem of real-time dynamic facial expression transfer using an Active Appearance Model framework. We provide a theoretical foundation for a generalisation of two well-known expression transfer methods and demonstrate the improved visual quality of the proposed linear extrapolation transfer method on examples of face swapping and expression transfer using the AVOZES data corpus. Realistic talking faces can be generated in real-time at low computational cost

    Learning weakly supervised multimodal phoneme embeddings

    Full text link
    Recent works have explored deep architectures for learning multimodal speech representation (e.g. audio and images, articulation and audio) in a supervised way. Here we investigate the role of combining different speech modalities, i.e. audio and visual information representing the lips movements, in a weakly supervised way using Siamese networks and lexical same-different side information. In particular, we ask whether one modality can benefit from the other to provide a richer representation for phone recognition in a weakly supervised setting. We introduce mono-task and multi-task methods for merging speech and visual modalities for phone recognition. The mono-task learning consists in applying a Siamese network on the concatenation of the two modalities, while the multi-task learning receives several different combinations of modalities at train time. We show that multi-task learning enhances discriminability for visual and multimodal inputs while minimally impacting auditory inputs. Furthermore, we present a qualitative analysis of the obtained phone embeddings, and show that cross-modal visual input can improve the discriminability of phonological features which are visually discernable (rounding, open/close, labial place of articulation), resulting in representations that are closer to abstract linguistic features than those based on audio only

    Automatic emotional state detection using facial expression dynamic in videos

    Get PDF
    In this paper, an automatic emotion detection system is built for a computer or machine to detect the emotional state from facial expressions in human computer communication. Firstly, dynamic motion features are extracted from facial expression videos and then advanced machine learning methods for classification and regression are used to predict the emotional states. The system is evaluated on two publicly available datasets, i.e. GEMEP_FERA and AVEC2013, and satisfied performances are achieved in comparison with the baseline results provided. With this emotional state detection capability, a machine can read the facial expression of its user automatically. This technique can be integrated into applications such as smart robots, interactive games and smart surveillance systems

    Learning based automatic face annotation for arbitrary poses and expressions from frontal images only

    Get PDF
    Statistical approaches for building non-rigid deformable models, such as the active appearance model (AAM), have enjoyed great popularity in recent years, but typically require tedious manual annotation of training images. In this paper, a learning based approach for the automatic annotation of visually deformable objects from a single annotated frontal image is presented and demonstrated on the example of automatically annotating face images that can be used for building AAMs for fitting and tracking. This approach employs the idea of initially learning the correspondences between landmarks in a frontal image and a set of training images with a face in arbitrary poses. Using this learner, virtual images of unseen faces at any arbitrary pose for which the learner was trained can be reconstructed by predicting the new landmark locations and warping the texture from the frontal image. View-based AAMs are then built from the virtual images and used for automatically annotating unseen images, including images of different facial expressions, at any random pose within the maximum range spanned by the virtually reconstructed images. The approach is experimentally validated by automatically annotating face images from three different databases

    GAGAN: Geometry-Aware Generative Adversarial Networks

    Full text link
    Deep generative models learned through adversarial training have become increasingly popular for their ability to generate naturalistic image textures. However, aside from their texture, the visual appearance of objects is significantly influenced by their shape geometry; information which is not taken into account by existing generative models. This paper introduces the Geometry-Aware Generative Adversarial Networks (GAGAN) for incorporating geometric information into the image generation process. Specifically, in GAGAN the generator samples latent variables from the probability space of a statistical shape model. By mapping the output of the generator to a canonical coordinate frame through a differentiable geometric transformation, we enforce the geometry of the objects and add an implicit connection from the prior to the generated object. Experimental results on face generation indicate that the GAGAN can generate realistic images of faces with arbitrary facial attributes such as facial expression, pose, and morphology, that are of better quality than current GAN-based methods. Our method can be used to augment any existing GAN architecture and improve the quality of the images generated

    Automated Analysis of Corpora Callosa

    Get PDF
    Abstract. This report describes and evaluates the steps needed to perform modern model-based interpretation of the corpus callosum in MRI. The process is discussed from the initial landmark-free contours to fullfledged statistical models based on the Active Appearance Models framework. Topics treated include landmark placement, background modelling and multi-resolution analysis. Preliminary quantitative and qualitative validation in a cross-sectional study show that fully automated analysis and segmentation of the corpus callosum are feasible.

    3D facial geometric features for constrained local model

    Get PDF
    We propose a 3D Constrained Local Model framework for deformable face alignment in depth image. Our framework exploits the intrinsic 3D geometric information in depth data by utilizing robust histogram-based 3D geometric features that are based on normal vectors. In addition, we demonstrate the fusion of intensity data and 3D features that further improves the facial landmark localization accuracy. The experiments are conducted on publicly available FRGC database. The results show that our 3D features based CLM completely outperforms the raw depth features based CLM in term of fitting accuracy and robustness, and the fusion of intensity and 3D depth feature further improves the performance. Another benefit is that the proposed 3D features in our framework do not require any pre-processing procedure on the data
    • …
    corecore