1,332 research outputs found
Photorealistic Audio-driven Video Portraits
Video portraits are common in a variety of applications, such as videoconferencing, news broadcasting, and virtual education and training. We present a novel method to synthesize photorealistic video portraits for an input portrait video, automatically driven by a personās voice. The main challenge in this task is the hallucination of plausible, photorealistic facial expressions from input speech audio. To address this challenge, we employ a parametric 3D face model represented by geometry, facial expression, illumination, etc., and learn a mapping from audio features to model parameters. The input source audio is first represented as a high-dimensional feature, which is used to predict facial expression parameters of the 3D face model. We then replace the expression parameters computed from the original target video with the predicted one, and rerender the reenacted face. Finally, we generate a photorealistic video portrait from the reenacted synthetic face sequence via a neural face renderer. One appealing feature of our approach is the generalization capability for various input speech audio, including synthetic speech audio from text-to-speech software. Extensive experimental results show that our approach outperforms previous general-purpose audio-driven video portrait methods. This includes a user study demonstrating that our results are rated as more realistic than previous methods
4D Human Body Capture from Egocentric Video via 3D Scene Grounding
We introduce a novel task of reconstructing a time series of second-person 3D
human body meshes from monocular egocentric videos. The unique viewpoint and
rapid embodied camera motion of egocentric videos raise additional technical
barriers for human body capture. To address those challenges, we propose a
simple yet effective optimization-based approach that leverages 2D observations
of the entire video sequence and human-scene interaction constraint to estimate
second-person human poses, shapes, and global motion that are grounded on the
3D environment captured from the egocentric view. We conduct detailed ablation
studies to validate our design choice. Moreover, we compare our method with the
previous state-of-the-art method on human motion capture from monocular video,
and show that our method estimates more accurate human-body poses and shapes
under the challenging egocentric setting. In addition, we demonstrate that our
approach produces more realistic human-scene interaction
Vision-Based Production of Personalized Video
In this paper we present a novel vision-based system for the automated production of personalised video souvenirs for visitors in leisure and cultural heritage venues. Visitors are visually identified and tracked through a camera network. The system produces a personalized DVD souvenir at the end of a visitorās stay allowing visitors to relive their experiences. We analyze how we identify visitors by fusing facial and body features, how we track visitors, how the tracker recovers from failures due to occlusions, as well as how we annotate and compile the final product. Our experiments demonstrate the feasibility of the proposed approach
Instant messaging on handhelds: an affective gesture approach
Text communication can be perceived as lacking in chat spontaneity, or plastic, due to medium limitations during interaction. A form of text messaging, Instant Messaging (IM), is now on the uptake, even on mobile handhelds. This paper presents results of using affective gesture to rubberise IM chat in order to improve synchronous communication spontaneity. The experimental design makes use of a text-only IM tool, running on handhelds, built with the Session Initiation Protocol (SIP) and the SIP Instant Messaging and Presence Leveraging Extensions (SIMPLE). The tool was developed with a novel user-defined hotkey ā a one-click context menu that fast-tracks the creation and transmission of text-gestures and emoticons. A hybrid quantitative and qualitative approach was taken in order to enable data triangulation. Data collected from user trials affirms that the affective gesture hotkey facility improves chat responsiveness, thus enhancing chat spontaneity.Telkom, Cisco, THRI
Call Me Caitlyn: Making and making over the 'authentic' transgender body in Anglo-American popular culture
A conception of transgender identity as an āauthenticā gendered core ātrappedā within a mismatched corporeality, and made tangible through corporeal transformations, has attained unprecedented legibility in contemporary Anglo-American media. Whilst pop-cultural articulations of this discourse have received some scholarly attention, the question of why this 'wrong body' paradigm has solidified as the normative explanation for gender transition within the popular media remains underexplored. This paper argues that this discourse has attained cultural pre-eminence through its convergence with a broader media and commercial zeitgeist, in which corporeal alteration and maintenance are perceived as means of accessing oneās āauthenticā self. I analyse the media representations of two transgender celebrities: Caitlyn Jenner and Nadia Almada, alongside the reality TV show TRANSform Me, exploring how these womenās gender transitions have been discursively aligned with a cultural imperative for all women, cisgender or trans, to display their authentic femininity through bodily work. This demonstrates how established tropes of authenticity-via-bodily transformation, have enabled transgender to become culturally legible through the wrong body trope. Problematically, I argue, this process has worked to demarcate ideals of āacceptableā transgender subjectivity: self-sufficient, normatively feminine, and eager to embrace the possibilities for happiness and social integration provided by the commercial domain
- ā¦