59 research outputs found
FML: Face Model Learning from Videos
Monocular image-based 3D reconstruction of faces is a long-standing problem
in computer vision. Since image data is a 2D projection of a 3D face, the
resulting depth ambiguity makes the problem ill-posed. Most existing methods
rely on data-driven priors that are built from limited 3D face scans. In
contrast, we propose multi-frame video-based self-supervised training of a deep
network that (i) learns a face identity model both in shape and appearance
while (ii) jointly learning to reconstruct 3D faces. Our face model is learned
using only corpora of in-the-wild video clips collected from the Internet. This
virtually endless source of training data enables learning of a highly general
3D face model. In order to achieve this, we propose a novel multi-frame
consistency loss that ensures consistent shape and appearance across multiple
frames of a subject's face, thus minimizing depth ambiguity. At test time we
can use an arbitrary number of frames, so that we can perform both monocular as
well as multi-frame reconstruction.Comment: CVPR 2019 (Oral). Video: https://www.youtube.com/watch?v=SG2BwxCw0lQ,
Project Page: https://gvv.mpi-inf.mpg.de/projects/FML19
Applications of Face Analysis and Modeling in Media Production
Facial expressions play an important role in day-by-day communication as well as media production. This article surveys automatic facial analysis and modeling methods using computer vision techniques and their applications for media production. The authors give a brief overview of the psychology of face perception and then describe some of the applications of computer vision and pattern recognition applied to face recognition in media production. This article also covers the automatic generation of face models, which are used in movie and TV productions for special effects in order to manipulate people's faces or combine real actors with computer graphics
FDLS: A Deep Learning Approach to Production Quality, Controllable, and Retargetable Facial Performances
Visual effects commonly requires both the creation of realistic synthetic
humans as well as retargeting actors' performances to humanoid characters such
as aliens and monsters. Achieving the expressive performances demanded in
entertainment requires manipulating complex models with hundreds of parameters.
Full creative control requires the freedom to make edits at any stage of the
production, which prohibits the use of a fully automatic ``black box'' solution
with uninterpretable parameters. On the other hand, producing realistic
animation with these sophisticated models is difficult and laborious. This
paper describes FDLS (Facial Deep Learning Solver), which is Weta Digital's
solution to these challenges. FDLS adopts a coarse-to-fine and
human-in-the-loop strategy, allowing a solved performance to be verified and
edited at several stages in the solving process. To train FDLS, we first
transform the raw motion-captured data into robust graph features. Secondly,
based on the observation that the artists typically finalize the jaw pass
animation before proceeding to finer detail, we solve for the jaw motion first
and predict fine expressions with region-based networks conditioned on the jaw
position. Finally, artists can optionally invoke a non-linear finetuning
process on top of the FDLS solution to follow the motion-captured virtual
markers as closely as possible. FDLS supports editing if needed to improve the
results of the deep learning solution and it can handle small daily changes in
the actor's face shape. FDLS permits reliable and production-quality
performance solving with minimal training and little or no manual effort in
many cases, while also allowing the solve to be guided and edited in unusual
and difficult cases. The system has been under development for several years
and has been used in major movies.Comment: DigiPro '22: The Digital Production Symposiu
Learning from the Artist: Theory and Practice of Example-Based Character Deformation
Movie and game production is very laborious, frequently involving hundreds of person-years for a single project. At present this work is difficult to fully automate, since it involves subjective and artistic judgments.
Broadly speaking, in this thesis we explore an approach that works with the artist, accelerating their work without attempting to replace them. More specifically, we describe an “example-based” approach, in which artists provide examples of the desired shapes of the character, and the results gradually improve as more examples are given. Since a character’s skin shape deforms as the pose or expression changes, or particular problem will be termed character deformation.
The overall goal of this thesis is to contribute a complete investigation and development of an example-based approach to character deformation. A central observation guiding this research is that character animation can be formulated as a high-dimensional problem, rather than the two- or three-dimensional viewpoint that is commonly adopted in computer graphics. A second observation guiding our inquiry is that statistical learning concepts are relevant. We show that example-based character animation algorithms can be informed, developed, and improved using these observations.
This thesis provides definitive surveys of example-based facial and body skin deformation.
This thesis analyzes the two leading families of example-based character deformation algorithms from the point of view of statistical regression. In doing so we show that a wide variety of existing tools in machine learning are applicable to our problem. We also identify several techniques that are not suitable due to the nature of the training data, and the high-dimensional nature of this regression problem. We evaluate the design decisions underlying these example-based algorithms, thus providing the groundwork for a ”best practice” choice of specific algorithms.
This thesis develops several new algorithms for accelerating example-based facial animation. The first algorithm allows unspecified degrees of freedom to be automatically determined based on the style of previous, completed animations. A second algorithm allows rapid editing and control of the process of transferring motion capture of a human actor to a computer graphics character.
The thesis identifies and develops several unpublished relations between the underlying mathematical techniques.
Lastly, the thesis provides novel tutorial derivations of several mathematical concepts, using only the linear algebra tools that are likely to be familiar to experts in computer graphics.
Portions of the research in this thesis have been published in eight papers, with two appearing in premier forums in the field
High-quality face capture, animation and editing from monocular video
Digitization of virtual faces in movies requires complex capture setups and extensive manual work to produce superb animations and video-realistic editing. This thesis pushes the boundaries of the digitization pipeline by proposing automatic algorithms for high-quality 3D face capture and animation, as well as photo-realistic face editing. These algorithms reconstruct and modify faces in 2D videos recorded in uncontrolled scenarios and illumination. In particular, advances in three main areas offer solutions for the lack of depth and overall uncertainty in video recordings. First, contributions in capture include model-based reconstruction of detailed, dynamic 3D geometry that exploits optical and shading cues, multilayer parametric reconstruction of accurate 3D models in unconstrained setups based on inverse rendering, and regression-based 3D lip shape enhancement from high-quality data. Second, advances in animation are video-based face reenactment based on robust appearance metrics and temporal clustering, performance-driven retargeting of detailed facial models in sync with audio, and the automatic creation of personalized controllable 3D rigs. Finally, advances in plausible photo-realistic editing are dense face albedo capture and mouth interior synthesis using image warping and 3D teeth proxies. High-quality results attained on challenging application scenarios confirm the contributions and show great potential for the automatic creation of photo-realistic 3D faces.Die Digitalisierung von Gesichtern zum Einsatz in der Filmindustrie erfordert komplizierte Aufnahmevorrichtungen und die manuelle Nachbearbeitung von Rekonstruktionen, um perfekte Animationen und realistische Videobearbeitung zu erzielen. Diese Dissertation erweitert vorhandene Digitalisierungsverfahren durch die Erforschung von automatischen Verfahren zur qualitativ hochwertigen 3D Rekonstruktion, Animation und Modifikation von Gesichtern. Diese Algorithmen erlauben es, Gesichter in 2D Videos, die unter allgemeinen Bedingungen und unbekannten Beleuchtungsverhältnissen aufgenommen wurden, zu rekonstruieren und zu modifizieren. Vor allem Fortschritte in den folgenden drei Hauptbereichen tragen zur Kompensation von fehlender Tiefeninformation und der allgemeinen Mehrdeutigkeit von 2D Videoaufnahmen bei. Erstens, Beiträge zur modellbasierten Rekonstruktion von detaillierter und dynamischer 3D Geometrie durch optische Merkmale und die Shading-Eigenschaften des Gesichts, mehrschichtige parametrische Rekonstruktion von exakten 3D Modellen mittels inversen Renderings in allgemeinen Szenen und regressionsbasierter 3D Lippenformverfeinerung mittels qualitativ hochwertigen Daten. Zweitens, Fortschritte im Bereich der Computeranimation durch videobasierte Gesichtsausdrucksübertragung und temporaler Clusterbildung, Übertragung von detaillierten Gesichtsmodellen, deren Mundbewegung mit Ton synchronisiert ist, und die automatische Erstellung von personalisierten "3D Face Rigs". Schließlich werden Fortschritte im Bereich der realistischen Videobearbeitung vorgestellt, welche auf der dichten Rekonstruktion von Hautreflektionseigenschaften und der Mundinnenraumsynthese mittels bildbasierten und geometriebasierten Verfahren aufbauen. Qualitativ hochwertige Ergebnisse in anspruchsvollen Anwendungen untermauern die Wichtigkeit der geleisteten Beiträgen und zeigen das große Potential der automatischen Erstellung von realistischen digitalen 3D Gesichtern auf
3D Human Face Reconstruction and 2D Appearance Synthesis
3D human face reconstruction has been an extensive research for decades due to its wide applications, such as animation, recognition and 3D-driven appearance synthesis. Although commodity depth sensors are widely available in recent years, image based face reconstruction are significantly valuable as images are much easier to access and store.
In this dissertation, we first propose three image-based face reconstruction approaches according to different assumption of inputs.
In the first approach, face geometry is extracted from multiple key frames of a video sequence with different head poses. The camera should be calibrated under this assumption.
As the first approach is limited to videos, we propose the second approach then focus on single image. This approach also improves the geometry by adding fine grains using shading cue. We proposed a novel albedo estimation and linear optimization algorithm in this approach.
In the third approach, we further loose the constraint of the input image to arbitrary in the wild images. Our proposed approach can robustly reconstruct high quality model even with extreme expressions and large poses.
We then explore the applicability of our face reconstructions on four interesting applications: video face beautification, generating personalized facial blendshape from image sequences, face video stylizing and video face replacement. We demonstrate great potentials of our reconstruction approaches on these real-world applications. In particular, with the recent surge of interests in VR/AR, it is increasingly common to see people wearing head-mounted displays. However, the large occlusion on face is a big obstacle for people to communicate in a face-to-face manner. Our another application is that we explore hardware/software solutions for synthesizing the face image with presence of HMDs. We design two setups (experimental and mobile) which integrate two near IR cameras and one color camera to solve this problem. With our algorithm and prototype, we can achieve photo-realistic results.
We further propose a deep neutral network to solve the HMD removal problem considering it as a face inpainting problem. This approach doesn\u27t need special hardware and run in real-time with satisfying results
Facial Expression Retargeting from Human to Avatar Made Easy
Facial expression retargeting from humans to virtual characters is a useful
technique in computer graphics and animation. Traditional methods use markers
or blendshapes to construct a mapping between the human and avatar faces.
However, these approaches require a tedious 3D modeling process, and the
performance relies on the modelers' experience. In this paper, we propose a
brand-new solution to this cross-domain expression transfer problem via
nonlinear expression embedding and expression domain translation. We first
build low-dimensional latent spaces for the human and avatar facial expressions
with variational autoencoder. Then we construct correspondences between the two
latent spaces guided by geometric and perceptual constraints. Specifically, we
design geometric correspondences to reflect geometric matching and utilize a
triplet data structure to express users' perceptual preference of avatar
expressions. A user-friendly method is proposed to automatically generate
triplets for a system allowing users to easily and efficiently annotate the
correspondences. Using both geometric and perceptual correspondences, we
trained a network for expression domain translation from human to avatar.
Extensive experimental results and user studies demonstrate that even
nonprofessional users can apply our method to generate high-quality facial
expression retargeting results with less time and effort.Comment: IEEE Transactions on Visualization and Computer Graphics (TVCG), to
appea
Applications of face analysis and modelling in media production:Overview of the state of the art
Facial expressions play an important role in day-by-day communication as well as media production. This article surveys automatic facial analysis and modeling methods using computer vision techniques and their applications for media production. The authors give a brief overview of the psychology of face perception and then describe some of the applications of computer vision and pattern recognition applied to face recognition in media production. This article also covers the automatic generation of face models, which are used in movie and TV productions for special effects in order to manipulate people's faces or combine real actors with computer graphics
Expressive Modulation of Neutral Visual Speech
The need for animated graphical models of the human face is commonplace in
the movies, video games and television industries, appearing in everything from
low budget advertisements and free mobile apps, to Hollywood blockbusters
costing hundreds of millions of dollars. Generative statistical models of
animation attempt to address some of the drawbacks of industry standard
practices such as labour intensity and creative inflexibility.
This work describes one such method for transforming speech animation curves
between different expressive styles. Beginning with the assumption that
expressive speech animation is a mix of two components, a high-frequency
speech component (the content) and a much lower-frequency expressive
component (the style), we use Independent Component Analysis (ICA) to
identify and manipulate these components independently of one another. Next
we learn how the energy for different speaking styles is distributed in terms of
the low-dimensional independent components model. Transforming the
speaking style involves projecting new animation curves into the lowdimensional
ICA space, redistributing the energy in the independent
components, and finally reconstructing the animation curves by inverting the
projection.
We show that a single ICA model can be used for separating multiple expressive
styles into their component parts. Subjective evaluations show that viewers can
reliably identify the expressive style generated using our approach, and that they
have difficulty in identifying transformed animated expressive speech from the
equivalent ground-truth
- …