5 research outputs found

    Leveraging Affect Transfer Learning for Behavior Prediction in an Intelligent Tutoring System

    Full text link
    In the context of building an intelligent tutoring system (ITS), which improves student learning outcomes by intervention, we set out to improve prediction of student problem outcome. In essence, we want to predict the outcome of a student answering a problem in an ITS from a video feed by analyzing their face and gestures. For this, we present a novel transfer learning facial affect representation and a user-personalized training scheme that unlocks the potential of this representation. We model the temporal structure of video sequences of students solving math problems using a recurrent neural network architecture. Additionally, we extend the largest dataset of student interactions with an intelligent online math tutor by a factor of two. Our final model, coined ATL-BP (Affect Transfer Learning for Behavior Prediction) achieves an increase in mean F-score over state-of-the-art of 45% on this new dataset in the general case and 50% in a more challenging leave-users-out experimental setting when we use a user-personalized training scheme

    DeepFN: Towards Generalizable Facial Action Unit Recognition with Deep Face Normalization

    Full text link
    Facial action unit recognition has many applications from market research to psychotherapy and from image captioning to entertainment. Despite its recent progress, deployment of these models has been impeded due to their limited generalization to unseen people and demographics. This work conducts an in-depth analysis of performance across several dimensions: individuals(40 subjects), genders (male and female), skin types (darker and lighter), and databases (BP4D and DISFA). To help suppress the variance in data, we use the notion of self-supervised denoising autoencoders to design a method for deep face normalization(DeepFN) that transfers facial expressions of different people onto a common facial template which is then used to train and evaluate facial action recognition models. We show that person-independent models yield significantly lower performance (55% average F1 and accuracy across 40 subjects) than person-dependent models (60.3%), leading to a generalization gap of 5.3%. However, normalizing the data with the newly introduced DeepFN significantly increased the performance of person-independent models (59.6%), effectively reducing the gap. Similarly, we observed generalization gaps when considering gender (2.4%), skin type (5.3%), and dataset (9.4%), which were significantly reduced with the use of DeepFN. These findings represent an important step towards the creation of more generalizable facial action unit recognition systems

    A system for recognizing human emotions based on speech analysis and facial feature extraction: applications to Human-Robot Interaction

    Get PDF
    With the advance in Artificial Intelligence, humanoid robots start to interact with ordinary people based on the growing understanding of psychological processes. Accumulating evidences in Human Robot Interaction (HRI) suggest that researches are focusing on making an emotional communication between human and robot for creating a social perception, cognition, desired interaction and sensation. Furthermore, robots need to receive human emotion and optimize their behavior to help and interact with a human being in various environments. The most natural way to recognize basic emotions is extracting sets of features from human speech, facial expression and body gesture. A system for recognition of emotions based on speech analysis and facial features extraction can have interesting applications in Human-Robot Interaction. Thus, the Human-Robot Interaction ontology explains how the knowledge of these fundamental sciences is applied in physics (sound analyses), mathematics (face detection and perception), philosophy theory (behavior) and robotic science context. In this project, we carry out a study to recognize basic emotions (sadness, surprise, happiness, anger, fear and disgust). Also, we propose a methodology and a software program for classification of emotions based on speech analysis and facial features extraction. The speech analysis phase attempted to investigate the appropriateness of using acoustic (pitch value, pitch peak, pitch range, intensity and formant), phonetic (speech rate) properties of emotive speech with the freeware program PRAAT, and consists of generating and analyzing a graph of speech signals. The proposed architecture investigated the appropriateness of analyzing emotive speech with the minimal use of signal processing algorithms. 30 participants to the experiment had to repeat five sentences in English (with durations typically between 0.40 s and 2.5 s) in order to extract data relative to pitch (value, range and peak) and rising-falling intonation. Pitch alignments (peak, value and range) have been evaluated and the results have been compared with intensity and speech rate. The facial feature extraction phase uses the mathematical formulation (B\ue9zier curves) and the geometric analysis of the facial image, based on measurements of a set of Action Units (AUs) for classifying the emotion. The proposed technique consists of three steps: (i) detecting the facial region within the image, (ii) extracting and classifying the facial features, (iii) recognizing the emotion. Then, the new data have been merged with reference data in order to recognize the basic emotion. Finally, we combined the two proposed algorithms (speech analysis and facial expression), in order to design a hybrid technique for emotion recognition. Such technique have been implemented in a software program, which can be employed in Human-Robot Interaction. The efficiency of the methodology was evaluated by experimental tests on 30 individuals (15 female and 15 male, 20 to 48 years old) form different ethnic groups, namely: (i) Ten adult European, (ii) Ten Asian (Middle East) adult and (iii) Ten adult American. Eventually, the proposed technique made possible to recognize the basic emotion in most of the cases

    A novel facial action intensity detection system

    Get PDF
    A dissertation submitted to the Faculty of Science, University of the Witwatersrand, Johannesburg, in fulfilment of requirements for the degree of Master of Science. October 2014.Despite the fact that there has been quite a lot of research done in the eld of facial expression recognition, not much development has occurred in detecting the intensity of facial actions. In facial expression recognition, the intensity of facial actions is an important and crucial aspect, since it would provide more information about the facial expression of an individual, such as the level of emotion in a face. Furthermore, having an automated system that can detect the intensity of facial actions in an individual's face can lead up to a lot of potential applications from lie detection to smart classrooms. The provided approach includes robust methods for face and facial feature extraction, and multiple machine learning methods for facial action intensity detection

    Gaze estimation and interaction in real-world environments

    Get PDF
    Human eye gaze has been widely used in human-computer interaction, as it is a promising modality for natural, fast, pervasive, and non-verbal interaction between humans and computers. As the foundation of gaze-related interactions, gaze estimation has been a hot research topic in recent decades. In this thesis, we focus on developing appearance-based gaze estimation methods and corresponding attentive user interfaces with a single webcam for challenging real-world environments. First, we collect a large-scale gaze estimation dataset, MPIIGaze, the first of its kind, outside of controlled laboratory conditions. Second, we propose an appearance-based method that, in stark contrast to a long-standing tradition in gaze estimation, only takes the full face image as input. Second, we propose an appearance-based method that, in stark contrast to a long-standing tradition in gaze estimation, only takes the full face image as input. Third, we study data normalisation for the first time in a principled way, and propose a modification that yields significant performance improvements. Fourth, we contribute an unsupervised detector for human-human and human-object eye contact. Finally, we study personal gaze estimation with multiple personal devices, such as mobile phones, tablets, and laptops.Der Blick des menschlichen Auges wird in Mensch-Computer-Interaktionen verbreitet eingesetzt, da dies eine vielversprechende Möglichkeit fĂŒr natĂŒrliche, schnelle, allgegenwĂ€rtige und nonverbale Interaktion zwischen Mensch und Computer ist. Als Grundlage von blickbezogenen Interaktionen ist die BlickschĂ€tzung in den letzten Jahrzehnten ein wichtiges Forschungsthema geworden. In dieser Arbeit konzentrieren wir uns auf die Entwicklung Erscheinungsbild-basierter Methoden zur BlickschĂ€tzung und entsprechender “attentive user interfaces” (die Aufmerksamkeit des Benutzers einbeziehende Benutzerschnittstellen) mit nur einer Webcam fĂŒr anspruchsvolle natĂŒrliche Umgebungen. ZunĂ€chst sammeln wir einen umfangreichen Datensatz zur BlickschĂ€tzung, MPIIGaze, der erste, der außerhalb von kontrollierten Laborbedingungen erstellt wurde. Zweitens schlagen wir eine Erscheinungsbild-basierte Methode vor, die im Gegensatz zur langjĂ€hrigen Tradition in der BlickschĂ€tzung nur eine vollstĂ€ndige Aufnahme des Gesichtes als Eingabe verwendet. Drittens untersuchen wir die Datennormalisierung erstmals grundsĂ€tzlich und schlagen eine Modifizierung vor, die zu signifikanten Leistungsverbesserungen fĂŒhrt. Viertens stellen wir einen unĂŒberwachten Detektor fĂŒr Augenkontakte zwischen Mensch und Mensch und zwischen Mensch und Objekt vor. Abschließend untersuchen wir die persönliche BlickschĂ€tzung mit mehreren persönlichen GerĂ€ten wie Handy, Tablet und Laptop
    corecore