1,332 research outputs found

    Photorealistic Audio-driven Video Portraits

    Get PDF
    Video portraits are common in a variety of applications, such as videoconferencing, news broadcasting, and virtual education and training. We present a novel method to synthesize photorealistic video portraits for an input portrait video, automatically driven by a personā€™s voice. The main challenge in this task is the hallucination of plausible, photorealistic facial expressions from input speech audio. To address this challenge, we employ a parametric 3D face model represented by geometry, facial expression, illumination, etc., and learn a mapping from audio features to model parameters. The input source audio is first represented as a high-dimensional feature, which is used to predict facial expression parameters of the 3D face model. We then replace the expression parameters computed from the original target video with the predicted one, and rerender the reenacted face. Finally, we generate a photorealistic video portrait from the reenacted synthetic face sequence via a neural face renderer. One appealing feature of our approach is the generalization capability for various input speech audio, including synthetic speech audio from text-to-speech software. Extensive experimental results show that our approach outperforms previous general-purpose audio-driven video portrait methods. This includes a user study demonstrating that our results are rated as more realistic than previous methods

    4D Human Body Capture from Egocentric Video via 3D Scene Grounding

    Full text link
    We introduce a novel task of reconstructing a time series of second-person 3D human body meshes from monocular egocentric videos. The unique viewpoint and rapid embodied camera motion of egocentric videos raise additional technical barriers for human body capture. To address those challenges, we propose a simple yet effective optimization-based approach that leverages 2D observations of the entire video sequence and human-scene interaction constraint to estimate second-person human poses, shapes, and global motion that are grounded on the 3D environment captured from the egocentric view. We conduct detailed ablation studies to validate our design choice. Moreover, we compare our method with the previous state-of-the-art method on human motion capture from monocular video, and show that our method estimates more accurate human-body poses and shapes under the challenging egocentric setting. In addition, we demonstrate that our approach produces more realistic human-scene interaction

    Vision-Based Production of Personalized Video

    No full text
    In this paper we present a novel vision-based system for the automated production of personalised video souvenirs for visitors in leisure and cultural heritage venues. Visitors are visually identified and tracked through a camera network. The system produces a personalized DVD souvenir at the end of a visitorā€™s stay allowing visitors to relive their experiences. We analyze how we identify visitors by fusing facial and body features, how we track visitors, how the tracker recovers from failures due to occlusions, as well as how we annotate and compile the final product. Our experiments demonstrate the feasibility of the proposed approach

    Instant messaging on handhelds: an affective gesture approach

    Get PDF
    Text communication can be perceived as lacking in chat spontaneity, or plastic, due to medium limitations during interaction. A form of text messaging, Instant Messaging (IM), is now on the uptake, even on mobile handhelds. This paper presents results of using affective gesture to rubberise IM chat in order to improve synchronous communication spontaneity. The experimental design makes use of a text-only IM tool, running on handhelds, built with the Session Initiation Protocol (SIP) and the SIP Instant Messaging and Presence Leveraging Extensions (SIMPLE). The tool was developed with a novel user-defined hotkey ā€“ a one-click context menu that fast-tracks the creation and transmission of text-gestures and emoticons. A hybrid quantitative and qualitative approach was taken in order to enable data triangulation. Data collected from user trials affirms that the affective gesture hotkey facility improves chat responsiveness, thus enhancing chat spontaneity.Telkom, Cisco, THRI

    Call Me Caitlyn: Making and making over the 'authentic' transgender body in Anglo-American popular culture

    Get PDF
    A conception of transgender identity as an ā€˜authenticā€™ gendered core ā€˜trappedā€™ within a mismatched corporeality, and made tangible through corporeal transformations, has attained unprecedented legibility in contemporary Anglo-American media. Whilst pop-cultural articulations of this discourse have received some scholarly attention, the question of why this 'wrong body' paradigm has solidified as the normative explanation for gender transition within the popular media remains underexplored. This paper argues that this discourse has attained cultural pre-eminence through its convergence with a broader media and commercial zeitgeist, in which corporeal alteration and maintenance are perceived as means of accessing oneā€™s ā€˜authenticā€™ self. I analyse the media representations of two transgender celebrities: Caitlyn Jenner and Nadia Almada, alongside the reality TV show TRANSform Me, exploring how these womenā€™s gender transitions have been discursively aligned with a cultural imperative for all women, cisgender or trans, to display their authentic femininity through bodily work. This demonstrates how established tropes of authenticity-via-bodily transformation, have enabled transgender to become culturally legible through the wrong body trope. Problematically, I argue, this process has worked to demarcate ideals of ā€˜acceptableā€™ transgender subjectivity: self-sufficient, normatively feminine, and eager to embrace the possibilities for happiness and social integration provided by the commercial domain
    • ā€¦
    corecore