Search CORE

3,415 research outputs found

Capture, Learning, and Synthesis of 3D Speaking Styles

Author: Black Michael J.
Bolkart Timo
Cudeiro Daniel
Laidlaw Cassidy
Ranjan Anurag
Publication venue
Publication date: 01/01/2019
Field of study

Audio-driven 3D facial animation has been widely explored, but achieving realistic, human-like performance is still unsolved. This is due to the lack of available 3D datasets, models, and standard evaluation metrics. To address this, we introduce a unique 4D face dataset with about 29 minutes of 4D scans captured at 60 fps and synchronized audio from 12 speakers. We then train a neural network on our dataset that factors identity from facial motion. The learned model, VOCA (Voice Operated Character Animation) takes any speech signal as input - even speech in languages other than English - and realistically animates a wide range of adult faces. Conditioning on subject labels during training allows the model to learn a variety of realistic speaking styles. VOCA also provides animator controls to alter speaking style, identity-dependent facial shape, and pose (i.e. head, jaw, and eyeball rotations) during animation. To our knowledge, VOCA is the only realistic 3D facial animation model that is readily applicable to unseen subjects without retargeting. This makes VOCA suitable for tasks like in-game video, virtual reality avatars, or any scenario in which the speaker, speech, or language is not known in advance. We make the dataset and model available for research purposes at http://voca.is.tue.mpg.de.Comment: To appear in CVPR 201

arXiv.org e-Print Archive

Crossref

MPG.PuRe

Text-based Editing of Talking-head Video

Author: Agrawala M.
Finkelstein A.
Fried O.
Genova K.
Goldman D.
Jin Z.
Shechtman E.
Tewari A.
Theobalt C.
Zollhöfer M.
Publication venue
Publication date: 01/01/2019
Field of study

Editing talking-head video to change the speech content or to remove filler words is challenging. We propose a novel method to edit talking-head video based on its transcript to produce a realistic output video in which the dialogue of the speaker has been modified, while maintaining a seamless audio-visual flow (i.e. no jump cuts). Our method automatically annotates an input talking-head video with phonemes, visemes, 3D face pose and geometry, reflectance, expression and scene illumination per frame. To edit a video, the user has to only edit the transcript, and an optimization strategy then chooses segments of the input corpus as base material. The annotated parameters corresponding to the selected segments are seamlessly stitched together and used to produce an intermediate video representation in which the lower half of the face is rendered with a parametric face model. Finally, a recurrent video generation network transforms this representation to a photorealistic video that matches the edited transcript. We demonstrate a large variety of edits, such as the addition, removal, and alteration of words, as well as convincing language translation and full sentence synthesis

MPG.PuRe

Speaker-following Video Subtitles

Author: Hu Yongtao
Kautz Jan
Wang Wenping
Yu Yizhou
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/01/2014
Field of study

We propose a new method for improving the presentation of subtitles in video (e.g. TV and movies). With conventional subtitles, the viewer has to constantly look away from the main viewing area to read the subtitles at the bottom of the screen, which disrupts the viewing experience and causes unnecessary eyestrain. Our method places on-screen subtitles next to the respective speakers to allow the viewer to follow the visual content while simultaneously reading the subtitles. We use novel identification algorithms to detect the speakers based on audio and visual information. Then the placement of the subtitles is determined using global optimization. A comprehensive usability study indicated that our subtitle placement method outperformed both conventional fixed-position subtitling and another previous dynamic subtitling method in terms of enhancing the overall viewing experience and reducing eyestrain

arXiv.org e-Print Archive

HKU Scholars Hub

Multimodal person recognition for human-vehicle interaction

Author: Abut Huseyin
Abut Hüseyin
Ercil Aytul
Erdogan Hakan
Erdoğan Hakan
Erzin Engin
Erçil Aytül
Tekalp A. Murat
Yemez Yucel
Yemez Yücel
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/04/2006
Field of study

Next-generation vehicles will undoubtedly feature biometric person recognition as part of an effort to improve the driving experience. Today's technology prevents such systems from operating satisfactorily under adverse conditions. A proposed framework for achieving person recognition successfully combines different biometric modalities, borne out in two case studies

Sabanci University Research Database