5 research outputs found
Leveraging Affect Transfer Learning for Behavior Prediction in an Intelligent Tutoring System
In the context of building an intelligent tutoring system (ITS), which
improves student learning outcomes by intervention, we set out to improve
prediction of student problem outcome. In essence, we want to predict the
outcome of a student answering a problem in an ITS from a video feed by
analyzing their face and gestures. For this, we present a novel transfer
learning facial affect representation and a user-personalized training scheme
that unlocks the potential of this representation. We model the temporal
structure of video sequences of students solving math problems using a
recurrent neural network architecture. Additionally, we extend the largest
dataset of student interactions with an intelligent online math tutor by a
factor of two. Our final model, coined ATL-BP (Affect Transfer Learning for
Behavior Prediction) achieves an increase in mean F-score over state-of-the-art
of 45% on this new dataset in the general case and 50% in a more challenging
leave-users-out experimental setting when we use a user-personalized training
scheme
DeepFN: Towards Generalizable Facial Action Unit Recognition with Deep Face Normalization
Facial action unit recognition has many applications from market research to
psychotherapy and from image captioning to entertainment. Despite its recent
progress, deployment of these models has been impeded due to their limited
generalization to unseen people and demographics. This work conducts an
in-depth analysis of performance across several dimensions: individuals(40
subjects), genders (male and female), skin types (darker and lighter), and
databases (BP4D and DISFA). To help suppress the variance in data, we use the
notion of self-supervised denoising autoencoders to design a method for deep
face normalization(DeepFN) that transfers facial expressions of different
people onto a common facial template which is then used to train and evaluate
facial action recognition models. We show that person-independent models yield
significantly lower performance (55% average F1 and accuracy across 40
subjects) than person-dependent models (60.3%), leading to a generalization gap
of 5.3%. However, normalizing the data with the newly introduced DeepFN
significantly increased the performance of person-independent models (59.6%),
effectively reducing the gap. Similarly, we observed generalization gaps when
considering gender (2.4%), skin type (5.3%), and dataset (9.4%), which were
significantly reduced with the use of DeepFN. These findings represent an
important step towards the creation of more generalizable facial action unit
recognition systems
A system for recognizing human emotions based on speech analysis and facial feature extraction: applications to Human-Robot Interaction
With the advance in Artificial Intelligence, humanoid robots start to interact with ordinary people based on the growing understanding of psychological processes. Accumulating evidences in Human Robot Interaction (HRI) suggest that researches are focusing on making an emotional communication between human and robot for creating a social perception, cognition, desired interaction and sensation.
Furthermore, robots need to receive human emotion and optimize their behavior to help and interact with a human being in various environments. The most natural way to recognize basic emotions is extracting sets of features from human speech, facial expression and body gesture. A system for recognition of emotions based on speech analysis and facial features extraction can have interesting applications in Human-Robot Interaction. Thus, the Human-Robot Interaction ontology explains how the knowledge of these fundamental sciences is applied in physics (sound analyses), mathematics (face detection and perception), philosophy theory (behavior) and robotic science context.
In this project, we carry out a study to recognize basic emotions (sadness, surprise, happiness, anger, fear and disgust). Also, we propose a methodology and a software program for classification of emotions based on speech analysis and facial features extraction.
The speech analysis phase attempted to investigate the appropriateness of using acoustic (pitch value, pitch peak, pitch range, intensity and formant), phonetic (speech rate) properties of emotive speech with the freeware program PRAAT, and consists of generating and analyzing a graph of speech signals. The proposed architecture investigated the appropriateness of analyzing emotive speech with the minimal use of signal processing algorithms. 30 participants to the experiment had to repeat five sentences in English (with durations typically between 0.40 s and 2.5 s) in order to extract data relative to pitch (value, range and peak) and rising-falling intonation. Pitch alignments (peak, value and range) have been evaluated and the results have been compared with intensity and speech rate.
The facial feature extraction phase uses the mathematical formulation (B\ue9zier curves) and the geometric analysis of the facial image, based on measurements of a set of Action Units (AUs) for classifying the emotion. The proposed technique consists of three steps: (i) detecting the facial region within the image, (ii) extracting and classifying the facial features, (iii) recognizing the emotion. Then, the new data have been merged with reference data in order to recognize the basic emotion.
Finally, we combined the two proposed algorithms (speech analysis and facial expression), in order to design a hybrid technique for emotion recognition. Such technique have been implemented in a software program, which can be employed in Human-Robot Interaction.
The efficiency of the methodology was evaluated by experimental tests on 30 individuals (15 female and 15 male, 20 to 48 years old) form different ethnic groups, namely: (i) Ten adult European, (ii) Ten Asian (Middle East) adult and (iii) Ten adult American.
Eventually, the proposed technique made possible to recognize the basic emotion in most of the cases
A novel facial action intensity detection system
A dissertation submitted to the Faculty of Science, University of the Witwatersrand, Johannesburg, in fulfilment of requirements for the degree of Master of Science. October 2014.Despite the fact that there has been quite a lot of research done in the eld of facial
expression recognition, not much development has occurred in detecting the intensity
of facial actions. In facial expression recognition, the intensity of facial actions is an
important and crucial aspect, since it would provide more information about the facial
expression of an individual, such as the level of emotion in a face. Furthermore, having
an automated system that can detect the intensity of facial actions in an individual's
face can lead up to a lot of potential applications from lie detection to smart classrooms.
The provided approach includes robust methods for face and facial feature extraction,
and multiple machine learning methods for facial action intensity detection
Gaze estimation and interaction in real-world environments
Human eye gaze has been widely used in human-computer interaction, as it is a promising modality for natural, fast, pervasive, and non-verbal interaction between humans and computers. As the foundation of gaze-related interactions, gaze estimation has been a hot research topic in recent decades. In this thesis, we focus on developing appearance-based gaze estimation methods and corresponding attentive user interfaces with a single webcam for challenging real-world environments. First, we collect a large-scale gaze estimation dataset, MPIIGaze, the first of its kind, outside of controlled laboratory conditions. Second, we propose an appearance-based method that, in stark contrast to a long-standing tradition in gaze estimation, only takes the full face image as input. Second, we propose an appearance-based method that, in stark contrast to a long-standing tradition in gaze estimation, only takes the full face image as input. Third, we study data normalisation for the first time in a principled way, and propose a modification that yields significant performance improvements. Fourth, we contribute an unsupervised detector for human-human and human-object eye contact. Finally, we study personal gaze estimation with multiple personal devices, such as mobile phones, tablets, and laptops.Der Blick des menschlichen Auges wird in Mensch-Computer-Interaktionen verbreitet eingesetzt, da dies eine vielversprechende Möglichkeit fĂŒr natĂŒrliche, schnelle, allgegenwĂ€rtige und nonverbale Interaktion zwischen Mensch und Computer ist. Als Grundlage von blickbezogenen Interaktionen ist die BlickschĂ€tzung in den letzten Jahrzehnten ein wichtiges Forschungsthema geworden. In dieser Arbeit konzentrieren wir uns auf die Entwicklung Erscheinungsbild-basierter Methoden zur BlickschĂ€tzung und entsprechender âattentive user interfacesâ (die Aufmerksamkeit des Benutzers einbeziehende Benutzerschnittstellen) mit nur einer Webcam fĂŒr anspruchsvolle natĂŒrliche Umgebungen. ZunĂ€chst sammeln wir einen umfangreichen Datensatz zur BlickschĂ€tzung, MPIIGaze, der erste, der auĂerhalb von kontrollierten Laborbedingungen erstellt wurde. Zweitens schlagen wir eine Erscheinungsbild-basierte Methode vor, die im Gegensatz zur langjĂ€hrigen Tradition in der BlickschĂ€tzung nur eine vollstĂ€ndige Aufnahme des Gesichtes als Eingabe verwendet. Drittens untersuchen wir die Datennormalisierung erstmals grundsĂ€tzlich und schlagen eine Modifizierung vor, die zu signifikanten Leistungsverbesserungen fĂŒhrt. Viertens stellen wir einen unĂŒberwachten Detektor fĂŒr Augenkontakte zwischen Mensch und Mensch und zwischen Mensch und Objekt vor. AbschlieĂend untersuchen wir die persönliche BlickschĂ€tzung mit mehreren persönlichen GerĂ€ten wie Handy, Tablet und Laptop