163 research outputs found
Deformable face ensemble alignment with robust grouped-L1 anchors
Many methods exist at the moment for deformable face fitting. A drawback to nearly all these approaches is that they are (i) noisy in terms of landmark positions, and (ii) the noise is biased across frames (i.e. the misalignment is toward common directions across all frames). In this paper we propose a grouped -norm anchored method for simultaneously aligning an ensemble of deformable face images stemming from the same subject, given noisy heterogeneous landmark estimates. Impressive alignment performance improvement and refinement is obtained using very weak initialization as "anchors"
Development of a Robotic Nanny for Children and a Case Study of Emotion Recognition in Human-Robotic Interaction
Ph.DDOCTOR OF PHILOSOPH
Robust subspace learning for static and dynamic affect and behaviour modelling
Machine analysis of human affect and behavior in naturalistic contexts has witnessed a growing attention in the last decade from various disciplines ranging from social and cognitive sciences to machine learning and computer vision. Endowing machines with the ability to seamlessly detect, analyze, model, predict as well as simulate and synthesize manifestations of internal emotional and behavioral states in real-world data is deemed essential for the deployment of next-generation, emotionally- and socially-competent human-centered interfaces. In this thesis, we are primarily motivated by the problem of modeling, recognizing and predicting spontaneous expressions of non-verbal human affect and behavior manifested through either low-level facial attributes in static images or high-level semantic events in image sequences. Both visual data and annotations of naturalistic affect and behavior naturally contain noisy measurements of unbounded magnitude at random locations, commonly referred to as ‘outliers’. We present here machine learning methods that are robust to such gross, sparse noise. First, we deal with static analysis of face images, viewing the latter as a superposition of mutually-incoherent, low-complexity components corresponding to facial attributes, such as facial identity, expressions and activation of atomic facial muscle actions. We develop a robust, discriminant dictionary learning framework to extract these components from grossly corrupted training data and combine it with sparse representation to recognize the associated attributes. We demonstrate that our framework can jointly address interrelated classification tasks such as face and facial expression recognition. Inspired by the well-documented importance of the temporal aspect in perceiving affect and behavior, we direct the bulk of our research efforts into continuous-time modeling of dimensional affect and social behavior. Having identified a gap in the literature which is the lack of data containing annotations of social attitudes in continuous time and scale, we first curate a new audio-visual database of multi-party conversations from political debates annotated frame-by-frame in terms of real-valued conflict intensity and use it to conduct the first study on continuous-time conflict intensity estimation. Our experimental findings corroborate previous evidence indicating the inability of existing classifiers in capturing the hidden temporal structures of affective and behavioral displays. We present here a novel dynamic behavior analysis framework which models temporal dynamics in an explicit way, based on the natural assumption that continuous- time annotations of smoothly-varying affect or behavior can be viewed as outputs of a low-complexity linear dynamical system when behavioral cues (features) act as system inputs. A novel robust structured rank minimization framework is proposed to estimate the system parameters in the presence of gross corruptions and partially missing data. Experiments on prediction of dimensional conflict and affect as well as multi-object tracking from detection validate the effectiveness of our predictive framework and demonstrate that for the first time that complex human behavior and affect can be learned and predicted based on small training sets of person(s)-specific observations.Open Acces
Discriminative Appearance Models for Face Alignment
The proposed face alignment algorithm uses local gradient features as the appearance representation. These features are obtained by pixel value comparison, which provide robustness against changes in illumination, as well as partial occlusion and local deformation due to the locality. The adopted features are modeled in three discriminative methods, which correspond to different alignment cost functions. The discriminative appearance modeling alleviate the generalization problem to some extent
Advanced Biometrics with Deep Learning
Biometrics, such as fingerprint, iris, face, hand print, hand vein, speech and gait recognition, etc., as a means of identity management have become commonplace nowadays for various applications. Biometric systems follow a typical pipeline, that is composed of separate preprocessing, feature extraction and classification. Deep learning as a data-driven representation learning approach has been shown to be a promising alternative to conventional data-agnostic and handcrafted pre-processing and feature extraction for biometric systems. Furthermore, deep learning offers an end-to-end learning paradigm to unify preprocessing, feature extraction, and recognition, based solely on biometric data. This Special Issue has collected 12 high-quality, state-of-the-art research papers that deal with challenging issues in advanced biometric systems based on deep learning. The 12 papers can be divided into 4 categories according to biometric modality; namely, face biometrics, medical electronic signals (EEG and ECG), voice print, and others
Photorealistic retrieval of occluded facial information using a performance-driven face model
Facial occlusions can cause both human observers and computer algorithms
to fail in a variety of important tasks such as facial action analysis and
expression classification. This is because the missing information is not
reconstructed accurately enough for the purpose of the task in hand. Most
current computer methods that are used to tackle this problem implement
complex three-dimensional polygonal face models that are generally timeconsuming
to produce and unsuitable for photorealistic reconstruction of
missing facial features and behaviour.
In this thesis, an image-based approach is adopted to solve the occlusion
problem. A dynamic computer model of the face is used to retrieve the
occluded facial information from the driver faces. The model consists of a
set of orthogonal basis actions obtained by application of principal
component analysis (PCA) on image changes and motion fields extracted
from a sequence of natural facial motion (Cowe 2003). Examples of
occlusion affected facial behaviour can then be projected onto the model to
compute coefficients of the basis actions and thus produce photorealistic
performance-driven animations.
Visual inspection shows that the PCA face model recovers aspects of
expressions in those areas occluded in the driver sequence, but the expression is generally muted. To further investigate this finding, a database
of test sequences affected by a considerable set of artificial and natural
occlusions is created. A number of suitable metrics is developed to measure
the accuracy of the reconstructions. Regions of the face that are most
important for performance-driven mimicry and that seem to carry the best
information about global facial configurations are revealed using Bubbles,
thus in effect identifying facial areas that are most sensitive to occlusions.
Recovery of occluded facial information is enhanced by applying an
appropriate scaling factor to the respective coefficients of the basis actions
obtained by PCA. This method improves the reconstruction of the facial
actions emanating from the occluded areas of the face. However, due to the
fact that PCA produces bases that encode composite, correlated actions,
such an enhancement also tends to affect actions in non-occluded areas of
the face. To avoid this, more localised controls for facial actions are
produced using independent component analysis (ICA). Simple projection
of the data onto an ICA model is not viable due to the non-orthogonality of
the extracted bases. Thus occlusion-affected mimicry is first generated using
the PCA model and then enhanced by accordingly manipulating the
independent components that are subsequently extracted from the mimicry.
This combination of methods yields significant improvements and results in
photorealistic reconstructions of occluded facial actions
- …