402 research outputs found
Facial Expression Recognition System
A key requirement for developing any innovative system in a computing environment is to integrate a sufficiently friendly interface with the average end user. Accurate design of such a user-centered interface, however, means more than just the ergonomics of the panels and displays. It also requires that designers precisely define what information to use and how, where, and when to use it. Facial expression as a natural, non-intrusive and efficient way of communication has been considered as one of the potential inputs of such interfaces. The work of this thesis aims at designing a robust Facial Expression Recognition (FER) system by combining various techniques from computer vision and pattern recognition.
Expression recognition is closely related to face recognition where a lot of research has been done and a vast array of algorithms have been introduced. FER can also be considered as a special case of a pattern recognition problem and many techniques are available. In the designing of an FER system, we can take advantage of these resources and use existing algorithms as building blocks of our system. So a major part of this work is to determine the optimal combination of algorithms. To do this, we first divide the system into 3 modules, i.e. Preprocessing, Feature Extraction and Classification, then for each of them some candidate methods are implemented, and eventually the optimal configuration is found by comparing the performance of different combinations.
Another issue that is of great interest to facial expression recognition systems designers is the classifier which is the core of the system. Conventional classification algorithms assume the image is a single variable function of a underlying class label. However this is not true in face recognition area where the appearance of the face is influenced by multiple factors: identity, expression, illumination and so on. To solve this problem, in this thesis we propose two new algorithms, namely Higher Order Canonical Correlation Analysis and Simple Multifactor Analysis which model the image as a multivariable function.
The addressed issues are challenging problems and are substantial for developing a facial expression recognition system
On Nonparametric Guidance for Learning Autoencoder Representations
Unsupervised discovery of latent representations, in addition to being useful
for density modeling, visualisation and exploratory data analysis, is also
increasingly important for learning features relevant to discriminative tasks.
Autoencoders, in particular, have proven to be an effective way to learn latent
codes that reflect meaningful variations in data. A continuing challenge,
however, is guiding an autoencoder toward representations that are useful for
particular tasks. A complementary challenge is to find codes that are invariant
to irrelevant transformations of the data. The most common way of introducing
such problem-specific guidance in autoencoders has been through the
incorporation of a parametric component that ties the latent representation to
the label information. In this work, we argue that a preferable approach relies
instead on a nonparametric guidance mechanism. Conceptually, it ensures that
there exists a function that can predict the label information, without
explicitly instantiating that function. The superiority of this guidance
mechanism is confirmed on two datasets. In particular, this approach is able to
incorporate invariance information (lighting, elevation, etc.) from the small
NORB object recognition dataset and yields state-of-the-art performance for a
single layer, non-convolutional network.Comment: 9 pages, 12 figure
Face Recognition and Facial Attribute Analysis from Unconstrained Visual Data
Analyzing human faces from visual data has been one of the most active research areas in the computer vision community. However, it is a very challenging problem in unconstrained environments due to variations in pose, illumination, expression, occlusion and blur between training and testing images. The task becomes even more difficult when only a limited number of images per subject is available for modeling these variations. In this dissertation, different techniques for performing classification of human faces as well as other facial attributes such as expression, age, gender, and head pose in uncontrolled settings are investigated.
In the first part of the dissertation, a method for reconstructing the virtual frontal view from a given non-frontal face image using Markov Random Fields (MRFs) and an efficient variant of the Belief Propagation (BP) algorithm is introduced. In the proposed approach, the input face image is divided into a grid of overlapping patches and a globally optimal set of local warps is estimated to synthesize the patches at the frontal view. A set of possible warps for each patch is obtained by aligning it with images from a training database of frontal faces. The alignments are performed efficiently in the Fourier domain using an extension of the Lucas-Kanade (LK) algorithm that can handle illumination variations. The problem of finding the optimal warps is then formulated as a discrete labeling problem using an MRF. The reconstructed frontal face image can then be used with any face recognition technique. The two main advantages of our method are that it does not require manually selected facial landmarks as well as no head pose estimation is needed.
In the second part, the task of face recognition in unconstrained settings is formulated as a domain adaptation problem. The domain shift is accounted for by deriving a latent subspace or domain, which jointly characterizes the multifactor variations using appropriate image formation models for each factor. The latent domain is defined as a product of Grassmann manifolds based on the underlying geometry of the tensor space, and recognition is performed across domain shift using statistics consistent with the tensor geometry. More specifically, given a face image from the source or target domain, multiple images of that subject are first synthesized under different illuminations, blur conditions, and 2D perturbations to form a tensor representation of the face. The orthogonal matrices obtained from the decomposition of this tensor, where each matrix corresponds to a factor variation, are used to characterize the subject as a point on a product of Grassmann manifolds. For cases with only one image per subject in the source domain, the identity of target domain faces is estimated using the geodesic distance on product manifolds. When multiple images per subject are available, an extension of kernel discriminant analysis is developed using a novel kernel based on the projection metric on product spaces. Furthermore, a probabilistic approach to the problem of classifying image sets on product manifolds is introduced.
Understanding attributes such as expression, age class, and gender from face images has many applications in multimedia processing including content personalization, human-computer interaction, and facial identification. To achieve good performance in these tasks, it is important to be able to extract pertinent visual structures from the input data. In the third part of the dissertation, a fully automatic approach for performing classification of facial attributes based on hierarchical feature learning using sparse coding is presented. The proposed approach is generative in the sense that it does not use label information in the process of feature learning. As a result, the same feature representation can be applied for different tasks such as expression, age, and gender classification. Final classification is performed by linear SVM trained with the corresponding labels for each task.
The last part of the dissertation presents an automatic algorithm for determining the head pose from a given face image. The face image is divided into a regular grid and represented by dense SIFT descriptors extracted from the grid points. Random Projection (RP) is then applied to reduce the dimension of the concatenated SIFT descriptor vector. Classification and regression using Support Vector Machine (SVM) are combined in order to obtain an accurate estimate of the head pose. The advantage of the proposed approach is that it does not require facial landmarks such as the eye and mouth corners, the nose tip to be extracted from the input face image as in many other methods
Idealized computational models for auditory receptive fields
This paper presents a theory by which idealized models of auditory receptive
fields can be derived in a principled axiomatic manner, from a set of
structural properties to enable invariance of receptive field responses under
natural sound transformations and ensure internal consistency between
spectro-temporal receptive fields at different temporal and spectral scales.
For defining a time-frequency transformation of a purely temporal sound
signal, it is shown that the framework allows for a new way of deriving the
Gabor and Gammatone filters as well as a novel family of generalized Gammatone
filters, with additional degrees of freedom to obtain different trade-offs
between the spectral selectivity and the temporal delay of time-causal temporal
window functions.
When applied to the definition of a second-layer of receptive fields from a
spectrogram, it is shown that the framework leads to two canonical families of
spectro-temporal receptive fields, in terms of spectro-temporal derivatives of
either spectro-temporal Gaussian kernels for non-causal time or the combination
of a time-causal generalized Gammatone filter over the temporal domain and a
Gaussian filter over the logspectral domain. For each filter family, the
spectro-temporal receptive fields can be either separable over the
time-frequency domain or be adapted to local glissando transformations that
represent variations in logarithmic frequencies over time. Within each domain
of either non-causal or time-causal time, these receptive field families are
derived by uniqueness from the assumptions.
It is demonstrated how the presented framework allows for computation of
basic auditory features for audio processing and that it leads to predictions
about auditory receptive fields with good qualitative similarity to biological
receptive fields measured in the inferior colliculus (ICC) and primary auditory
cortex (A1) of mammals.Comment: 55 pages, 22 figures, 3 table
Tensor Regression with Applications in Neuroimaging Data Analysis
Classical regression methods treat covariates as a vector and estimate a
corresponding vector of regression coefficients. Modern applications in medical
imaging generate covariates of more complex form such as multidimensional
arrays (tensors). Traditional statistical and computational methods are proving
insufficient for analysis of these high-throughput data due to their ultrahigh
dimensionality as well as complex structure. In this article, we propose a new
family of tensor regression models that efficiently exploit the special
structure of tensor covariates. Under this framework, ultrahigh dimensionality
is reduced to a manageable level, resulting in efficient estimation and
prediction. A fast and highly scalable estimation algorithm is proposed for
maximum likelihood estimation and its associated asymptotic properties are
studied. Effectiveness of the new methods is demonstrated on both synthetic and
real MRI imaging data.Comment: 27 pages, 4 figure
Applied Computational Techniques on Schizophrenia Using Genetic Mutations
[Abstract] Schizophrenia is a complex disease, with both genetic and environmental influence. Machine learning techniques can be used to associate different genetic variations at different genes with a (schizophrenic or non-schizophrenic) phenotype. Several machine learning techniques were applied to schizophrenia data to obtain the results presented in this study. Considering these data, Quantitative Genotype – Disease Relationships (QDGRs) can be used for disease prediction. One of the best machine learning-based models obtained after this exhaustive comparative study was implemented online; this model is an artificial neural network (ANN). Thus, the tool offers the possibility to introduce Single Nucleotide Polymorphism (SNP) sequences in order to classify a patient with schizophrenia. Besides this comparative study, a method for variable selection, based on ANNs and evolutionary computation (EC), is also presented. This method uses half the number of variables as the original ANN and the variables obtained are among those found in other publications. In the future, QDGR models based on nucleic acid information could be expanded to other diseases.Programa Iberoamericano de Ciencia y TecnologĂa para el Desarrollo; 209RT-0366Xunta de Galicia; 10SIN105004PRInstituto de Salud Carlos III; RD07/0067/0005Xunta de Galicia; Ref. 2009/5
- …