3,103 research outputs found
Capture, Learning, and Synthesis of 3D Speaking Styles
Audio-driven 3D facial animation has been widely explored, but achieving
realistic, human-like performance is still unsolved. This is due to the lack of
available 3D datasets, models, and standard evaluation metrics. To address
this, we introduce a unique 4D face dataset with about 29 minutes of 4D scans
captured at 60 fps and synchronized audio from 12 speakers. We then train a
neural network on our dataset that factors identity from facial motion. The
learned model, VOCA (Voice Operated Character Animation) takes any speech
signal as input - even speech in languages other than English - and
realistically animates a wide range of adult faces. Conditioning on subject
labels during training allows the model to learn a variety of realistic
speaking styles. VOCA also provides animator controls to alter speaking style,
identity-dependent facial shape, and pose (i.e. head, jaw, and eyeball
rotations) during animation. To our knowledge, VOCA is the only realistic 3D
facial animation model that is readily applicable to unseen subjects without
retargeting. This makes VOCA suitable for tasks like in-game video, virtual
reality avatars, or any scenario in which the speaker, speech, or language is
not known in advance. We make the dataset and model available for research
purposes at http://voca.is.tue.mpg.de.Comment: To appear in CVPR 201
Calipso: Physics-based Image and Video Editing through CAD Model Proxies
We present Calipso, an interactive method for editing images and videos in a
physically-coherent manner. Our main idea is to realize physics-based
manipulations by running a full physics simulation on proxy geometries given by
non-rigidly aligned CAD models. Running these simulations allows us to apply
new, unseen forces to move or deform selected objects, change physical
parameters such as mass or elasticity, or even add entire new objects that
interact with the rest of the underlying scene. In Calipso, the user makes
edits directly in 3D; these edits are processed by the simulation and then
transfered to the target 2D content using shape-to-image correspondences in a
photo-realistic rendering process. To align the CAD models, we introduce an
efficient CAD-to-image alignment procedure that jointly minimizes for rigid and
non-rigid alignment while preserving the high-level structure of the input
shape. Moreover, the user can choose to exploit image flow to estimate scene
motion, producing coherent physical behavior with ambient dynamics. We
demonstrate Calipso's physics-based editing on a wide range of examples
producing myriad physical behavior while preserving geometric and visual
consistency.Comment: 11 page
Animating Through Warping: an Efficient Method for High-Quality Facial Expression Animation
Advances in deep neural networks have considerably improved the art of
animating a still image without operating in 3D domain. Whereas, prior arts can
only animate small images (typically no larger than 512x512) due to memory
limitations, difficulty of training and lack of high-resolution (HD) training
datasets, which significantly reduce their potential for applications in movie
production and interactive systems. Motivated by the idea that HD images can be
generated by adding high-frequency residuals to low-resolution results produced
by a neural network, we propose a novel framework known as Animating Through
Warping (ATW) to enable efficient animation of HD images.
Specifically, the proposed framework consists of two modules, a novel
two-stage neural-network generator and a novel post-processing module known as
Animating Through Warping (ATW). It only requires the generator to be trained
on small images and can do inference on an image of any size. During inference,
an HD input image is decomposed into a low-resolution component(128x128) and
its corresponding high-frequency residuals. The generator predicts the
low-resolution result as well as the motion field that warps the input face to
the desired status (e.g., expressions categories or action units). Finally, the
ResWarp module warps the residuals based on the motion field and adding the
warped residuals to generates the final HD results from the naively up-sampled
low-resolution results. Experiments show the effectiveness and efficiency of
our method in generating high-resolution animations. Our proposed framework
successfully animates a 4K facial image, which has never been achieved by prior
neural models. In addition, our method generally guarantee the temporal
coherency of the generated animations. Source codes will be made publicly
available.Comment: 18 pages, 13 figures, Accepted to ACM Multimedia 202
From Fantasy to Virtual Reality: An Exploration of Modeling, Rigging and Animating Characters for Video Games
In the last few decades video games have quickly become one of the most popular forms of entertainment around the world. This can be linked to the improvement of computer systems and graphics which now allow for authentic and highly detailed computer generated characters. This project examines how these characters are modeled and developed. The examination of game characters entails a brief history of video games and their aesthetics. The foundations of character design are discussed and 3D modeling of a character is explored in detail. Finally, rigging or skeleton placement is investigated in order to animate the characters designed for this study. The result is two animated characters, which can be incorporated into several of the current and popular game engines. By the end of this paper the reader should have a fundamental understanding of how a video game character is designed, modeled, rigged, and animated
Creative tools for producing realistic 3D facial expressions and animation
Creative exploration of realistic 3D facial animation is a popular but very challenging task due to the high level knowledge and skills required. This forms a barrier for creative individuals who have limited technical skills but wish to explore their creativity in this area. This paper proposes a new technique that facilitates users’ creative exploration by hiding the technical complexities of producing facial expressions and animation. The proposed technique draws on research from psychology, anatomy and employs Autodesk Maya as a use case by developing a creative tool, which extends Maya’s Blend Shape Editor. User testing revealed that novice users in the creative media, employing the proposed tool can produce rich and realistic facial expressions that portray new interesting emotions. It reduced production time by 25% when compared to Maya and by 40% when compared to 3DS Max equivalent tools
A survey of real-time crowd rendering
In this survey we review, classify and compare existing approaches for real-time crowd rendering. We first overview character animation techniques, as they are highly tied to crowd rendering performance, and then we analyze the state of the art in crowd rendering. We discuss different representations for level-of-detail (LoD) rendering of animated characters, including polygon-based, point-based, and image-based techniques, and review different criteria for runtime LoD selection. Besides LoD approaches, we review classic acceleration schemes, such as frustum culling and occlusion culling, and describe how they can be adapted to handle crowds of animated characters. We also discuss specific acceleration techniques for crowd rendering, such as primitive pseudo-instancing, palette skinning, and dynamic key-pose caching, which benefit from current graphics hardware. We also address other factors affecting performance and realism of crowds such as lighting, shadowing, clothing and variability. Finally we provide an exhaustive comparison of the most relevant approaches in the field.Peer ReviewedPostprint (author's final draft
Relating Objective and Subjective Performance Measures for AAM-based Visual Speech Synthesizers
We compare two approaches for synthesizing visual speech using Active Appearance Models (AAMs): one that utilizes acoustic features as input, and one that utilizes a phonetic transcription as input. Both synthesizers are trained using the same data and the performance is measured using both objective and subjective testing. We investigate the impact of likely sources of error in the synthesized visual speech by introducing typical errors into real visual speech sequences and subjectively measuring the perceived degradation. When only a small region (e.g. a single syllable) of ground-truth visual speech is incorrect we find that the subjective score for the entire sequence is subjectively lower than sequences generated by our synthesizers. This observation motivates further consideration of an often ignored issue, which is to what extent are subjective measures correlated with objective measures of performance? Significantly, we find that the most commonly used objective measures of performance are not necessarily the best indicator of viewer perception of quality. We empirically evaluate alternatives and show that the cost of a dynamic time warp of synthesized visual speech parameters to the respective ground-truth parameters is a better indicator of subjective quality
- …