19,344 research outputs found

    Synthesis and Control of High Resolution Facial Expressions for Visual Interactions

    Full text link
    The synthesis of facial expression with control of intensity and personal styles is important in intelligent and affective human-computer interaction, especially in face-to-face inter-action between human and intelligent agent. We present a facial expression animation system that facilitates control of expressiveness and style. We learn a decomposable genera-tive model for the nonlinear deformation of facial expressions by analyzing the mapping space between low dimensional embedded representation and high resolution tracking data. Bilinear analysis of the mapping space provides a compact representation of the nonlinear generative model for facial expressions. The decomposition allows synthesis of new fa-cial expressions by control of geometry and expression style. The generative model provides control of expressiveness pre-serving nonlinear deformation in the expressions with simple parameters and allows synthesis of stylized facial geometry. In addition, we can directly extract the MPEG-4 Facial Ani-mation Parameters (FAPs) from the synthesized data, which allows using any animation engine that supports FAPs to ani-mate new synthesized expressions. 1

    VGAN-Based Image Representation Learning for Privacy-Preserving Facial Expression Recognition

    Full text link
    Reliable facial expression recognition plays a critical role in human-machine interactions. However, most of the facial expression analysis methodologies proposed to date pay little or no attention to the protection of a user's privacy. In this paper, we propose a Privacy-Preserving Representation-Learning Variational Generative Adversarial Network (PPRL-VGAN) to learn an image representation that is explicitly disentangled from the identity information. At the same time, this representation is discriminative from the standpoint of facial expression recognition and generative as it allows expression-equivalent face image synthesis. We evaluate the proposed model on two public datasets under various threat scenarios. Quantitative and qualitative results demonstrate that our approach strikes a balance between the preservation of privacy and data utility. We further demonstrate that our model can be effectively applied to other tasks such as expression morphing and image completion

    Towards responsive Sensitive Artificial Listeners

    Get PDF
    This paper describes work in the recently started project SEMAINE, which aims to build a set of Sensitive Artificial Listeners – conversational agents designed to sustain an interaction with a human user despite limited verbal skills, through robust recognition and generation of non-verbal behaviour in real-time, both when the agent is speaking and listening. We report on data collection and on the design of a system architecture in view of real-time responsiveness

    Capture, Learning, and Synthesis of 3D Speaking Styles

    Full text link
    Audio-driven 3D facial animation has been widely explored, but achieving realistic, human-like performance is still unsolved. This is due to the lack of available 3D datasets, models, and standard evaluation metrics. To address this, we introduce a unique 4D face dataset with about 29 minutes of 4D scans captured at 60 fps and synchronized audio from 12 speakers. We then train a neural network on our dataset that factors identity from facial motion. The learned model, VOCA (Voice Operated Character Animation) takes any speech signal as input - even speech in languages other than English - and realistically animates a wide range of adult faces. Conditioning on subject labels during training allows the model to learn a variety of realistic speaking styles. VOCA also provides animator controls to alter speaking style, identity-dependent facial shape, and pose (i.e. head, jaw, and eyeball rotations) during animation. To our knowledge, VOCA is the only realistic 3D facial animation model that is readily applicable to unseen subjects without retargeting. This makes VOCA suitable for tasks like in-game video, virtual reality avatars, or any scenario in which the speaker, speech, or language is not known in advance. We make the dataset and model available for research purposes at http://voca.is.tue.mpg.de.Comment: To appear in CVPR 201

    Sensing and mapping for interactive performance

    Get PDF
    This paper describes a trans-domain mapping (TDM) framework for translating meaningful activities from one creative domain onto another. The multi-disciplinary framework is designed to facilitate an intuitive and non-intrusive interactive multimedia performance interface that offers the users or performers real-time control of multimedia events using their physical movements. It is intended to be a highly dynamic real-time performance tool, sensing and tracking activities and changes, in order to provide interactive multimedia performances. From a straightforward definition of the TDM framework, this paper reports several implementations and multi-disciplinary collaborative projects using the proposed framework, including a motion and colour-sensitive system, a sensor-based system for triggering musical events, and a distributed multimedia server for audio mapping of a real-time face tracker, and discusses different aspects of mapping strategies in their context. Plausible future directions, developments and exploration with the proposed framework, including stage augmenta tion, virtual and augmented reality, which involve sensing and mapping of physical and non-physical changes onto multimedia control events, are discussed
    corecore