1,242 research outputs found
Machine Analysis of Facial Expressions
No abstract
Multimodal emotion recognition
Reading emotions from facial expression and speech is a milestone in Human-Computer
Interaction. Recent sensing technologies, namely the Microsoft Kinect Sensor, provide
basic input modalities data, such as RGB imaging, depth imaging and speech, that can
be used in Emotion Recognition. Moreover Kinect can track a face in real time and
present the face fiducial points, as well as 6 basic Action Units (AUs).
In this work we explore this information by gathering a new and exclusive
dataset. This is a new opportunity for the academic community as well to the progress
of the emotion recognition problem. The database includes RGB, depth, audio, fiducial
points and AUs for 18 volunteers for 7 emotions. We then present automatic emotion
classification results on this dataset by employing k-Nearest Neighbor, Support Vector
Machines and Neural Networks classifiers, with unimodal and multimodal approaches.
Our conclusions show that multimodal approaches can attain better results.Ler e reconhecer emoções de expressões faciais e verbais é um marco na Interacção
Humana com um Computador. As recentes tecnologias de deteção, nomeadamente o
sensor Microsoft Kinect, recolhem dados de modalidades básicas como imagens RGB,
de informaçãode profundidade e defala que podem ser usados em reconhecimento de
emoções. Mais ainda, o sensor Kinect consegue reconhecer e seguir uma cara em tempo
real e apresentar os pontos fiduciais, assim como as 6 AUs – Action Units básicas.
Neste trabalho exploramos esta informação através da compilação de um dataset único e
exclusivo que representa uma oportunidade para a comunidade académica e para o
progresso do problema do reconhecimento de emoções. Este dataset inclui dados RGB,
de profundidade, de fala, pontos fiduciais e AUs, para 18 voluntários e 7 emoções.
Apresentamos resultados com a classificação automática de emoções com este dataset,
usando classificadores k-vizinhos próximos, máquinas de suporte de vetoreseredes
neuronais, em abordagens multimodais e unimodais. As nossas conclusões indicam que
abordagens multimodais permitem obter melhores resultados
LOMo: Latent Ordinal Model for Facial Analysis in Videos
We study the problem of facial analysis in videos. We propose a novel weakly
supervised learning method that models the video event (expression, pain etc.)
as a sequence of automatically mined, discriminative sub-events (eg. onset and
offset phase for smile, brow lower and cheek raise for pain). The proposed
model is inspired by the recent works on Multiple Instance Learning and latent
SVM/HCRF- it extends such frameworks to model the ordinal or temporal aspect in
the videos, approximately. We obtain consistent improvements over relevant
competitive baselines on four challenging and publicly available video based
facial analysis datasets for prediction of expression, clinical pain and intent
in dyadic conversations. In combination with complimentary features, we report
state-of-the-art results on these datasets.Comment: 2016 IEEE Conference on Computer Vision and Pattern Recognition
(CVPR
Time-Efficient Hybrid Approach for Facial Expression Recognition
Facial expression recognition is an emerging research area for improving human and computer interaction. This research plays a significant role in the field of social communication, commercial enterprise, law enforcement, and other computer interactions. In this paper, we propose a time-efficient hybrid design for facial expression recognition, combining image pre-processing steps and different Convolutional Neural Network (CNN) structures providing better accuracy and greatly improved training time. We are predicting seven basic emotions of human faces: sadness, happiness, disgust, anger, fear, surprise and neutral. The model performs well regarding challenging facial expression recognition where the emotion expressed could be one of several due to their quite similar facial characteristics such as anger, disgust, and sadness. The experiment to test the model was conducted across multiple databases and different facial orientations, and to the best of our knowledge, the model provided an accuracy of about 89.58% for KDEF dataset, 100% accuracy for JAFFE dataset and 71.975% accuracy for combined (KDEF + JAFFE + SFEW) dataset across these different scenarios. Performance evaluation was done by cross-validation techniques to avoid bias towards a specific set of images from a database
Baseline CNN structure analysis for facial expression recognition
We present a baseline convolutional neural network (CNN) structure and image
preprocessing methodology to improve facial expression recognition algorithm
using CNN. To analyze the most efficient network structure, we investigated
four network structures that are known to show good performance in facial
expression recognition. Moreover, we also investigated the effect of input
image preprocessing methods. Five types of data input (raw, histogram
equalization, isotropic smoothing, diffusion-based normalization, difference of
Gaussian) were tested, and the accuracy was compared. We trained 20 different
CNN models (4 networks x 5 data input types) and verified the performance of
each network with test images from five different databases. The experiment
result showed that a three-layer structure consisting of a simple convolutional
and a max pooling layer with histogram equalization image input was the most
efficient. We describe the detailed training procedure and analyze the result
of the test accuracy based on considerable observation.Comment: 6 pages, RO-MAN2016 Conferenc
Reconhecimento de padrões em expressões faciais : algoritmos e aplicações
Orientador: HĂ©lio PedriniTese (doutorado) - Universidade Estadual de Campinas, Instituto de ComputaçãoResumo: O reconhecimento de emoções tem-se tornado um tĂłpico relevante de pesquisa pela comunidade cientĂfica, uma vez que desempenha um papel essencial na melhoria contĂnua dos sistemas de interação humano-computador. Ele pode ser aplicado em diversas áreas, tais como medicina, entretenimento, vigilância, biometria, educação, redes sociais e computação afetiva. Há alguns desafios em aberto relacionados ao desenvolvimento de sistemas emocionais baseados em expressões faciais, como dados que refletem emoções mais espontâneas e cenários reais. Nesta tese de doutorado, apresentamos diferentes metodologias para o desenvolvimento de sistemas de reconhecimento de emoções baseado em expressões faciais, bem como sua aplicabilidade na resolução de outros problemas semelhantes. A primeira metodologia Ă© apresentada para o reconhecimento de emoções em expressões faciais ocluĂdas baseada no Histograma da Transformada Census (CENTRIST). Expressões faciais ocluĂdas sĂŁo reconstruĂdas usando a Análise Robusta de Componentes Principais (RPCA). A extração de caracterĂsticas das expressões faciais Ă© realizada pelo CENTRIST, bem como pelos Padrões Binários Locais (LBP), pela Codificação Local do Gradiente (LGC) e por uma extensĂŁo do LGC. O espaço de caracterĂsticas gerado Ă© reduzido aplicando-se a Análise de Componentes Principais (PCA) e a Análise Discriminante Linear (LDA). Os algoritmos K-Vizinhos mais PrĂłximos (KNN) e Máquinas de Vetores de Suporte (SVM) sĂŁo usados para classificação. O mĂ©todo alcançou taxas de acerto competitivas para expressões faciais ocluĂdas e nĂŁo ocluĂdas. A segunda Ă© proposta para o reconhecimento dinâmico de expressões faciais baseado em Ritmos Visuais (VR) e Imagens da HistĂłria do Movimento (MHI), de modo que uma fusĂŁo de ambos descritores codifique informações de aparĂŞncia, forma e movimento dos vĂdeos. Para extração das caracterĂsticas, o Descritor Local de Weber (WLD), o CENTRIST, o Histograma de Gradientes Orientados (HOG) e a Matriz de CoocorrĂŞncia em NĂvel de Cinza (GLCM) sĂŁo empregados. A abordagem apresenta uma nova proposta para o reconhecimento dinâmico de expressões faciais e uma análise da relevância das partes faciais. A terceira Ă© um mĂ©todo eficaz apresentado para o reconhecimento de emoções audiovisuais com base na fala e nas expressões faciais. A metodologia envolve uma rede neural hĂbrida para extrair caracterĂsticas visuais e de áudio dos vĂdeos. Para extração de áudio, uma Rede Neural Convolucional (CNN) baseada no log-espectrograma de Mel Ă© usada, enquanto uma CNN construĂda sobre a Transformada de Census Ă© empregada para a extração das caracterĂsticas visuais. Os atributos audiovisuais sĂŁo reduzidos por PCA e LDA, entĂŁo classificados por KNN, SVM, RegressĂŁo LogĂstica (LR) e Gaussian NaĂŻve Bayes (GNB). A abordagem obteve taxas de reconhecimento competitivas, especialmente em dados espontâneos. A penĂşltima investiga o problema de detectar a sĂndrome de Down a partir de fotografias. Um descritor geomĂ©trico Ă© proposto para extrair caracterĂsticas faciais. Experimentos realizados em uma base de dados pĂşblica mostram a eficácia da metodologia desenvolvida. A Ăşltima metodologia trata do reconhecimento de sĂndromes genĂ©ticas em fotografias. O mĂ©todo visa extrair atributos faciais usando caracterĂsticas de uma rede neural profunda e medidas antropomĂ©tricas. Experimentos sĂŁo realizados em uma base de dados pĂşblica, alcançando taxas de reconhecimento competitivasAbstract: Emotion recognition has become a relevant research topic by the scientific community, since it plays an essential role in the continuous improvement of human-computer interaction systems. It can be applied in various areas, for instance, medicine, entertainment, surveillance, biometrics, education, social networks, and affective computing. There are some open challenges related to the development of emotion systems based on facial expressions, such as data that reflect more spontaneous emotions and real scenarios. In this doctoral dissertation, we propose different methodologies to the development of emotion recognition systems based on facial expressions, as well as their applicability in the development of other similar problems. The first is an emotion recognition methodology for occluded facial expressions based on the Census Transform Histogram (CENTRIST). Occluded facial expressions are reconstructed using an algorithm based on Robust Principal Component Analysis (RPCA). Extraction of facial expression features is then performed by CENTRIST, as well as Local Binary Patterns (LBP), Local Gradient Coding (LGC), and an LGC extension. The generated feature space is reduced by applying Principal Component Analysis (PCA) and Linear Discriminant Analysis (LDA). K-Nearest Neighbor (KNN) and Support Vector Machine (SVM) algorithms are used for classification. This method reached competitive accuracy rates for occluded and non-occluded facial expressions. The second proposes a dynamic facial expression recognition based on Visual Rhythms (VR) and Motion History Images (MHI), such that a fusion of both encodes appearance, shape, and motion information of the video sequences. For feature extraction, Weber Local Descriptor (WLD), CENTRIST, Histogram of Oriented Gradients (HOG), and Gray-Level Co-occurrence Matrix (GLCM) are employed. This approach shows a new direction for performing dynamic facial expression recognition, and an analysis of the relevance of facial parts. The third is an effective method for audio-visual emotion recognition based on speech and facial expressions. The methodology involves a hybrid neural network to extract audio and visual features from videos. For audio extraction, a Convolutional Neural Network (CNN) based on log Mel-spectrogram is used, whereas a CNN built on Census Transform is employed for visual extraction. The audio and visual features are reduced by PCA and LDA, and classified through KNN, SVM, Logistic Regression (LR), and Gaussian NaĂŻve Bayes (GNB). This approach achieves competitive recognition rates, especially in a spontaneous data set. The second last investigates the problem of detecting Down syndrome from photographs. A geometric descriptor is proposed to extract facial features. Experiments performed on a public data set show the effectiveness of the developed methodology. The last methodology is about recognizing genetic disorders in photos. This method focuses on extracting facial features using deep features and anthropometric measurements. Experiments are conducted on a public data set, achieving competitive recognition ratesDoutoradoCiĂŞncia da ComputaçãoDoutora em CiĂŞncia da Computação140532/2019-6CNPQCAPE
Discriminatively Trained Latent Ordinal Model for Video Classification
We study the problem of video classification for facial analysis and human
action recognition. We propose a novel weakly supervised learning method that
models the video as a sequence of automatically mined, discriminative
sub-events (eg. onset and offset phase for "smile", running and jumping for
"highjump"). The proposed model is inspired by the recent works on Multiple
Instance Learning and latent SVM/HCRF -- it extends such frameworks to model
the ordinal aspect in the videos, approximately. We obtain consistent
improvements over relevant competitive baselines on four challenging and
publicly available video based facial analysis datasets for prediction of
expression, clinical pain and intent in dyadic conversations and on three
challenging human action datasets. We also validate the method with qualitative
results and show that they largely support the intuitions behind the method.Comment: Paper accepted in IEEE TPAMI. arXiv admin note: substantial text
overlap with arXiv:1604.0150
- …