6,315 research outputs found
Optical Flow Constraints on Deformable Models With Applications to Face Tracking
Optical flow provides a constraint on the motion of a deformable model. We derive and solve a dynamic system incorporating flow as a hard constraint, producing a model-based least-squares optical flow solution. Our solution also ensures the constraint remains satisfied when combined with edge information, which helps combat tracking error accumulation. Constraint enforcement can be relaxed using a Kalman filter, which permits controlled constraint violations based on the noise present in the optical flow information, and enables optical flow and edge information to be combined more robustly and efficiently. We apply this framework to the estimation of face shape and motion using a 3D deformable face model. This model uses a small number of parameters to describe a rich variety of face shapes and facial expressions. We present experiments in extracting the shape and motion of a face from image sequences which validate the accuracy of the method. They also demonstrate that our treatment of optical flow as a hard constraint, as well as our use of a Kalman filter to reconcile these constraints with the uncertainty in the optical flow, are vital for improving the performance of our system
Capture, Learning, and Synthesis of 3D Speaking Styles
Audio-driven 3D facial animation has been widely explored, but achieving
realistic, human-like performance is still unsolved. This is due to the lack of
available 3D datasets, models, and standard evaluation metrics. To address
this, we introduce a unique 4D face dataset with about 29 minutes of 4D scans
captured at 60 fps and synchronized audio from 12 speakers. We then train a
neural network on our dataset that factors identity from facial motion. The
learned model, VOCA (Voice Operated Character Animation) takes any speech
signal as input - even speech in languages other than English - and
realistically animates a wide range of adult faces. Conditioning on subject
labels during training allows the model to learn a variety of realistic
speaking styles. VOCA also provides animator controls to alter speaking style,
identity-dependent facial shape, and pose (i.e. head, jaw, and eyeball
rotations) during animation. To our knowledge, VOCA is the only realistic 3D
facial animation model that is readily applicable to unseen subjects without
retargeting. This makes VOCA suitable for tasks like in-game video, virtual
reality avatars, or any scenario in which the speaker, speech, or language is
not known in advance. We make the dataset and model available for research
purposes at http://voca.is.tue.mpg.de.Comment: To appear in CVPR 201
Subspace Representations for Robust Face and Facial Expression Recognition
Analyzing human faces and modeling their variations have always been of interest to the computer vision community. Face analysis based on 2D intensity images is a challenging problem, complicated by variations in pose, lighting, blur, and non-rigid facial deformations due to facial expressions. Among the different sources of variation, facial expressions are of interest as important channels of non-verbal communication. Facial expression analysis is also affected by changes in view-point and inter-subject variations in performing different expressions. This dissertation makes an attempt to address some of the challenges involved in developing robust algorithms for face and facial expression recognition by exploiting the idea of proper subspace representations for data.
Variations in the visual appearance of an object mostly arise due to changes in illumination and pose. So we first present a video-based sequential algorithm for estimating the face albedo as an illumination-insensitive signature for face recognition. We show that by knowing/estimating the pose of the face at each frame of a sequence, the albedo can be efficiently estimated using a Kalman filter. Then we extend this to the case of unknown pose by simultaneously tracking the pose as well as updating the albedo through an efficient Bayesian inference method performed using a Rao-Blackwellized particle filter.
Since understanding the effects of blur, especially motion blur, is an important problem in unconstrained visual analysis, we then propose a blur-robust recognition algorithm for faces with spatially varying blur. We model a blurred face as a weighted average of geometrically transformed instances of its clean face. We then build a matrix, for each gallery face, whose column space spans the space of all the motion blurred images obtained from the clean face. This matrix representation is then used to define a proper objective function and perform blur-robust face recognition.
To develop robust and generalizable models for expression analysis one needs to break the dependence of the models on the choice of the coordinate frame of the camera. To this end, we build models for expressions on the affine shape-space (Grassmann manifold), as an approximation to the projective shape-space, by using a Riemannian interpretation of deformations that facial expressions cause on different parts of the face. This representation enables us to perform various expression analysis and recognition algorithms without the need for pose normalization as a preprocessing step.
There is a large degree of inter-subject variations in performing various expressions. This poses an important challenge on developing robust facial expression recognition algorithms. To address this challenge, we propose a dictionary-based approach for facial expression analysis by decomposing expressions in terms of action units (AUs). First, we construct an AU-dictionary using domain experts' knowledge of AUs. To incorporate the high-level knowledge regarding expression decomposition and AUs, we then perform structure-preserving sparse coding by imposing two layers of grouping over AU-dictionary atoms as well as over the test image matrix columns. We use the computed sparse code matrix for each expressive face to perform expression decomposition and recognition.
Most of the existing methods for the recognition of faces and expressions consider either the expression-invariant face recognition problem or the identity-independent facial expression recognition problem. We propose joint face and facial expression recognition using a dictionary-based component separation algorithm (DCS). In this approach, the given expressive face is viewed as a superposition of a neutral face component with a facial expression component, which is sparse with respect to the whole image. This assumption leads to a dictionary-based component separation algorithm, which benefits from the idea of sparsity and morphological diversity. The DCS algorithm uses the data-driven dictionaries to decompose an expressive test face into its constituent components. The sparse codes we obtain as a result of this decomposition are then used for joint face and expression recognition
Face Frontalization Based on Robustly Fitting a Deformable Shape Model to 3D Landmarks
Face frontalization consists of synthesizing a frontally-viewed face from an
arbitrarily-viewed one. The main contribution of this paper is a robust face
alignment method that enables pixel-to-pixel warping. The method simultaneously
estimates the rigid transformation (scale, rotation, and translation) and the
non-rigid deformation between two 3D point sets: a set of 3D landmarks
extracted from an arbitrary-viewed face, and a set of 3D landmarks
parameterized by a frontally-viewed deformable face model. An important merit
of the proposed method is its ability to deal both with noise (small
perturbations) and with outliers (large errors). We propose to model inliers
and outliers with the generalized Student's t-probability distribution
function, a heavy-tailed distribution that is immune to non-Gaussian errors in
the data. We describe in detail the associated expectation-maximization (EM)
algorithm that alternates between the estimation of (i) the rigid parameters,
(ii) the deformation parameters, and (iii) the Student-t distribution
parameters. We also propose to use the zero-mean normalized cross-correlation,
between a frontalized face and the corresponding ground-truth frontally-viewed
face, to evaluate the performance of frontalization. To this end, we use a
dataset that contains pairs of profile-viewed and frontally-viewed faces. This
evaluation, based on direct image-to-image comparison, stands in contrast with
indirect evaluation, based on analyzing the effect of frontalization on face
recognition
Generation, Estimation and Tracking of Faces
This thesis describes techniques for the construction of face models for both computer graphics and computer vision applications. It also details model-based computer vision methods for extracting and combining data with the model. Our face models respect the measurements of populations described by face anthropometry studies. In computer graphics, the anthropometric measurements permit the automatic generation of varied geometric models of human faces. This is accomplished by producing a random set of face measurements generated according to anthropometric statistics. A face fitting these measurements is realized using variational modeling. In computer vision, anthropometric data biases face shape estimation towards more plausible individuals. Having such a detailed model encourages the use of model-based techniques—we use a physics-based deformable model framework. We derive and solve a dynamic system which accounts for edges in the image and incorporates optical flow as a motion constraint on the model. Our solution ensures this constraint remains satisfied when edge information is used, which helps prevent tracking drift. This method is extended using the residuals from the optical flow solution. The extracted structure of the model can be improved by determining small changes in the model that reduce this error residual. We present experiments in extracting the shape and motion of a face from image sequences which exhibit the generality of our technique, as well as provide validation
Automatic facial analysis for objective assessment of facial paralysis
Facial Paralysis is a condition causing decreased movement on one side of the face. A quantitative, objective and reliable assessment system would be an invaluable tool for clinicians treating patients with this condition. This paper presents an approach based on the automatic analysis of patient video data. Facial feature localization and facial movement detection methods are discussed. An algorithm is presented to process the optical flow data to obtain the motion features in the relevant facial regions. Three classification methods are applied to provide quantitative evaluations of regional facial nerve function and the overall facial nerve function based on the House-Brackmann Scale. Experiments show the Radial Basis Function (RBF) Neural Network to have superior performance
The computer synthesis of expressive three-dimensional facial character animation.
This present research is concerned with the design, development and implementation of three-dimensional
computer-generated facial images capable of expression
gesture and speech.
A review of previous work in chapter one shows that to date
the model of computer-generated faces has been one in which
construction and animation were not separated and which
therefore possessed only a limited expressive range. It is
argued in chapter two that the physical description of the
face cannot be seen as originating from a single generic
mould. Chapter three therefore describes data acquisition
techniques employed in the computer generation of free-form
surfaces which are applicable to three-dimensional faces.
Expressions are the result of the distortion of the surface
of the skin by the complex interactions of bone, muscle and
skin. Chapter four demonstrates with static images and short
animation sequences in video that a muscle model process
algorithm can simulate the primary characteristics of the
facial muscles.
Three-dimensional speech synchronization was the most
complex problem to achieve effectively. Chapter five
describes two successful approaches: the direct mapping of
mouth shapes in two dimensions to the model in three
dimensions, and geometric distortions of the mouth created
by the contraction of specified muscle combinations.
Chapter six describes the implementation of software for
this research and argues the case for a parametric approach.
Chapter seven is concerned with the control of facial
articulations and discusses a more biological approach to
these. Finally chapter eight draws conclusions from the
present research and suggests further extensions
Geometric Expression Invariant 3D Face Recognition using Statistical Discriminant Models
Currently there is no complete face recognition system that is invariant to all facial expressions.
Although humans find it easy to identify and recognise faces regardless of changes in illumination,
pose and expression, producing a computer system with a similar capability has proved to
be particularly di cult. Three dimensional face models are geometric in nature and therefore
have the advantage of being invariant to head pose and lighting. However they are still susceptible
to facial expressions. This can be seen in the decrease in the recognition results using
principal component analysis when expressions are added to a data set.
In order to achieve expression-invariant face recognition systems, we have employed a tensor
algebra framework to represent 3D face data with facial expressions in a parsimonious
space. Face variation factors are organised in particular subject and facial expression modes.
We manipulate this using single value decomposition on sub-tensors representing one variation
mode. This framework possesses the ability to deal with the shortcomings of PCA in less constrained
environments and still preserves the integrity of the 3D data. The results show improved
recognition rates for faces and facial expressions, even recognising high intensity expressions
that are not in the training datasets.
We have determined, experimentally, a set of anatomical landmarks that best describe facial
expression e ectively. We found that the best placement of landmarks to distinguish di erent
facial expressions are in areas around the prominent features, such as the cheeks and eyebrows.
Recognition results using landmark-based face recognition could be improved with better placement.
We looked into the possibility of achieving expression-invariant face recognition by reconstructing
and manipulating realistic facial expressions. We proposed a tensor-based statistical
discriminant analysis method to reconstruct facial expressions and in particular to neutralise
facial expressions. The results of the synthesised facial expressions are visually more realistic
than facial expressions generated using conventional active shape modelling (ASM). We
then used reconstructed neutral faces in the sub-tensor framework for recognition purposes.
The recognition results showed slight improvement. Besides biometric recognition, this novel
tensor-based synthesis approach could be used in computer games and real-time animation
applications
Atomic displacements accompanying deformation twinning: shears and shuffles
Deformation twins grow by the motion of disconnections along their interfaces, thereby coupling shear with migration. Atomic-scale simulations of this mechanism have advanced to the point where the trajectory of each atom can be followed as it transits from a site in the shrinking grain, through the interface, and onwards to a site in the growing twin. Historically, such trajectories have been factorised into shear and shuffle components according to some defined convention. In the present article, we introduce a method of factorisation consistent with disconnection motion. This procedure is illustrated for the case of {10-12} twinning in hcp materials, and shown to agree with simulated atomic trajectories for Zr.Peer ReviewedPostprint (published version
- …