Search CORE

16,329 research outputs found

Capture, Learning, and Synthesis of 3D Speaking Styles

Author: Black Michael J.
Bolkart Timo
Cudeiro Daniel
Laidlaw Cassidy
Ranjan Anurag
Publication venue
Publication date: 01/01/2019
Field of study

Audio-driven 3D facial animation has been widely explored, but achieving realistic, human-like performance is still unsolved. This is due to the lack of available 3D datasets, models, and standard evaluation metrics. To address this, we introduce a unique 4D face dataset with about 29 minutes of 4D scans captured at 60 fps and synchronized audio from 12 speakers. We then train a neural network on our dataset that factors identity from facial motion. The learned model, VOCA (Voice Operated Character Animation) takes any speech signal as input - even speech in languages other than English - and realistically animates a wide range of adult faces. Conditioning on subject labels during training allows the model to learn a variety of realistic speaking styles. VOCA also provides animator controls to alter speaking style, identity-dependent facial shape, and pose (i.e. head, jaw, and eyeball rotations) during animation. To our knowledge, VOCA is the only realistic 3D facial animation model that is readily applicable to unseen subjects without retargeting. This makes VOCA suitable for tasks like in-game video, virtual reality avatars, or any scenario in which the speaker, speech, or language is not known in advance. We make the dataset and model available for research purposes at http://voca.is.tue.mpg.de.Comment: To appear in CVPR 201

arXiv.org e-Print Archive

Crossref

MPG.PuRe

Articulatory features for speech-driven head motion synthesis

Author: Ben Youssef Atef
Braude David A.
Shimodaira Hiroshi
Publication venue
Publication date: 01/08/2013
Field of study

This study investigates the use of articulatory features for speech-driven head motion synthesis as opposed to prosody features such as F0 and energy that have been mainly used in the literature. In the proposed approach, multi-stream HMMs are trained jointly on the synchronous streams of speech and head motion data. Articulatory features can be regarded as an intermediate parametrisation of speech that are expected to have a close link with head movement. Measured head and articulatory movements acquired by EMA were synchronously recorded with speech. Measured articulatory data was compared to those predicted from speech using an HMM-based inversion mapping system trained in a semi-supervised fashion. Canonical correlation analysis (CCA) on a data set of free speech of 12 people shows that the articulatory features are more correlated with head rotation than prosodic and/or cepstral speech features. It is also shown that the synthesised head motion using articulatory features gave higher correlations with the original head motion than when only prosodic features are used. Index Terms: head motion synthesis, articulatory features, canonical correlation analysis, acoustic-to-articulatory mappin

CiteSeerX

Edinburgh Research Explorer

Real Time Animation of Virtual Humans: A Trade-off Between Naturalness and Control

Author: Abe
Abe
Ahmed
Allen
Amaya
Arikan
Badler
Barzel
Bodenheimer
Boeing
Boulic
Brand
Bruderlin
Callennec
Carvalho
Chaminade
Chao
Chi
Coros
Da Silva
Da Silva
Egges
Egges
Egges
Egges
Faloutsos
Fang
Feldman
Fitts
Flash
Forsyth
Gibet
Gibet
Glardon
Gleicher
Gleicher
Gleicher
Grassia
Gratch
Grochow
Ha
Harris
Hartmann
Hartmann
Heck
Heck
Heck
Hodgins
Hsu
Ik Soo
Ikemoto
Ikemoto
Isaacs
Jang
Jansen
Kallmann
Kawato
Ko
Kokkevis
Kopp
Kopp
Kovar
Kovar
Kovar
Kovar
Lance
Lau
Lee
Lee
Lee
Lee
Lee
Lee
Li
Liu
Lo
Macmillan
Magnenat-Thalmann
Mandel
Mirtich
Mizuguchi
Muico
Mukai
Ménardais
Neff
Neff
Neff
Neff
Neff
Neff
Noot
Oore
Oshita
Perlin
Perlin
Pollard
Reeves
Reitsma
Reitsma
Reitsma
Ren
Rose
Rose
Ruttkay
Safonova
Schmidt
Shapiro
Shapiro
Shapiro
Shapiro
Sharon
Shin
Shin
Sims
Stewart
Thiebaux
Tolani
Torresani
Treuille
Uno
Unuma
Urtasun
Van Basten
Van Basten
Van Basten
Van Welbergen
Viviani
Wiley
Winter
Witkin
Woodson
Wooten
Wooten
Wrotek
Yin
Yin
Yin
Zeltzer
Zhao
Zordan
Zordan
Zordan
Zordan
Publication venue: Blackwell Publishing
Publication date: 01/01/2010
Field of study

Virtual humans are employed in many interactive applications using 3D virtual environments, including (serious) games. The motion of such virtual humans should look realistic (or ‘natural’) and allow interaction with the surroundings and other (virtual) humans. Current animation techniques differ in the trade-off they offer between motion naturalness and the control that can be exerted over the motion. We show mechanisms to parametrize, combine (on different body parts) and concatenate motions generated by different animation techniques. We discuss several aspects of motion naturalness and show how it can be evaluated. We conclude by showing the promise of combinations of different animation paradigms to enhance both naturalness and control

Crossref

Publications at Bielefeld University

University of Twente Research Information

A High Quality Text-To-Speech System Composed of Multiple Neural Networks

Author: Corrigan Gerald
Karaali Orhan
Mackie Andrew
Massey Noel
Miller Corey
Schnurr Otto
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/1998
Field of study

While neural networks have been employed to handle several different text-to-speech tasks, ours is the first system to use neural networks throughout, for both linguistic and acoustic processing. We divide the text-to-speech task into three subtasks, a linguistic module mapping from text to a linguistic representation, an acoustic module mapping from the linguistic representation to speech, and a video module mapping from the linguistic representation to animated images. The linguistic module employs a letter-to-sound neural network and a postlexical neural network. The acoustic module employs a duration neural network and a phonetic neural network. The visual neural network is employed in parallel to the acoustic module to drive a talking head. The use of neural networks that can be retrained on the characteristics of different voices and languages affords our system a degree of adaptability and naturalness heretofore unavailable.Comment: Source link (9812006.tar.gz) contains: 1 PostScript file (4 pages) and 3 WAV audio files. If your system does not support Windows WAV files, try a tool like "sox" to translate the audio into a format of your choic

arXiv.org e-Print Archive

CiteSeerX

HeadOn: Real-time Reenactment of Human Portrait Videos

Author: Nießner Matthias
Stamminger Marc
Theobalt Christian
Thies Justus
Zollhöfer Michael
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/01/2018
Field of study

We propose HeadOn, the first real-time source-to-target reenactment approach for complete human portrait videos that enables transfer of torso and head motion, face expression, and eye gaze. Given a short RGB-D video of the target actor, we automatically construct a personalized geometry proxy that embeds a parametric head, eye, and kinematic torso model. A novel real-time reenactment algorithm employs this proxy to photo-realistically map the captured motion from the source actor to the target actor. On top of the coarse geometric proxy, we propose a video-based rendering technique that composites the modified target portrait video via view- and pose-dependent texturing, and creates photo-realistic imagery of the target actor under novel torso and head poses, facial expressions, and gaze directions. To this end, we propose a robust tracking of the face and torso of the source actor. We extensively evaluate our approach and show significant improvements in enabling much greater flexibility in creating realistic reenacted output videos.Comment: Video: https://www.youtube.com/watch?v=7Dg49wv2c_g Presented at Siggraph'1

arXiv.org e-Print Archive

MPG.PuRe