285 research outputs found
Video face replacement
We present a method for replacing facial performances in video. Our approach accounts for differences in identity, visual appearance, speech, and timing between source and target videos. Unlike prior work, it does not require substantial manual operation or complex acquisition hardware, only single-camera video. We use a 3D multilinear model to track the facial performance in both videos. Using the corresponding 3D geometry, we warp the source to the target face and retime the source to match the target performance. We then compute an optimal seam through the video volume that maintains temporal consistency in the final composite. We showcase the use of our method on a variety of examples and present the result of a user study that suggests our results are difficult to distinguish from real video footage.National Science Foundation (U.S.) (Grant PHY-0835713)National Science Foundation (U.S.) (Grant DMS-0739255
Capture, Learning, and Synthesis of 3D Speaking Styles
Audio-driven 3D facial animation has been widely explored, but achieving
realistic, human-like performance is still unsolved. This is due to the lack of
available 3D datasets, models, and standard evaluation metrics. To address
this, we introduce a unique 4D face dataset with about 29 minutes of 4D scans
captured at 60 fps and synchronized audio from 12 speakers. We then train a
neural network on our dataset that factors identity from facial motion. The
learned model, VOCA (Voice Operated Character Animation) takes any speech
signal as input - even speech in languages other than English - and
realistically animates a wide range of adult faces. Conditioning on subject
labels during training allows the model to learn a variety of realistic
speaking styles. VOCA also provides animator controls to alter speaking style,
identity-dependent facial shape, and pose (i.e. head, jaw, and eyeball
rotations) during animation. To our knowledge, VOCA is the only realistic 3D
facial animation model that is readily applicable to unseen subjects without
retargeting. This makes VOCA suitable for tasks like in-game video, virtual
reality avatars, or any scenario in which the speaker, speech, or language is
not known in advance. We make the dataset and model available for research
purposes at http://voca.is.tue.mpg.de.Comment: To appear in CVPR 201
Learning to Generate Posters of Scientific Papers
Researchers often summarize their work in the form of posters. Posters
provide a coherent and efficient way to convey core ideas from scientific
papers. Generating a good scientific poster, however, is a complex and time
consuming cognitive task, since such posters need to be readable, informative,
and visually aesthetic. In this paper, for the first time, we study the
challenging problem of learning to generate posters from scientific papers. To
this end, a data-driven framework, that utilizes graphical models, is proposed.
Specifically, given content to display, the key elements of a good poster,
including panel layout and attributes of each panel, are learned and inferred
from data. Then, given inferred layout and attributes, composition of graphical
elements within each panel is synthesized. To learn and validate our model, we
collect and make public a Poster-Paper dataset, which consists of scientific
papers and corresponding posters with exhaustively labelled panels and
attributes. Qualitative and quantitative results indicate the effectiveness of
our approach.Comment: in Proceedings of the 30th AAAI Conference on Artificial Intelligence
(AAAI'16), Phoenix, AZ, 201
Video-driven Neural Physically-based Facial Asset for Production
Production-level workflows for producing convincing 3D dynamic human faces
have long relied on an assortment of labor-intensive tools for geometry and
texture generation, motion capture and rigging, and expression synthesis.
Recent neural approaches automate individual components but the corresponding
latent representations cannot provide artists with explicit controls as in
conventional tools. In this paper, we present a new learning-based,
video-driven approach for generating dynamic facial geometries with
high-quality physically-based assets. For data collection, we construct a
hybrid multiview-photometric capture stage, coupling with ultra-fast video
cameras to obtain raw 3D facial assets. We then set out to model the facial
expression, geometry and physically-based textures using separate VAEs where we
impose a global MLP based expression mapping across the latent spaces of
respective networks, to preserve characteristics across respective attributes.
We also model the delta information as wrinkle maps for the physically-based
textures, achieving high-quality 4K dynamic textures. We demonstrate our
approach in high-fidelity performer-specific facial capture and cross-identity
facial motion retargeting. In addition, our multi-VAE-based neural asset, along
with the fast adaptation schemes, can also be deployed to handle in-the-wild
videos. Besides, we motivate the utility of our explicit facial disentangling
strategy by providing various promising physically-based editing results with
high realism. Comprehensive experiments show that our technique provides higher
accuracy and visual fidelity than previous video-driven facial reconstruction
and animation methods.Comment: For project page, see https://sites.google.com/view/npfa/ Notice: You
may not copy, reproduce, distribute, publish, display, perform, modify,
create derivative works, transmit, or in any way exploit any such content,
nor may you distribute any part of this content over any network, including a
local area network, sell or offer it for sale, or use such content to
construct any kind of databas
- …