7 research outputs found

    Analysis of convolutional neural networks for facial expression recognition on GPU, TPU and CPU

    Get PDF
    In light of the increasing computational capacity provided by Central Processing Units (CPUs), Graphics Processing Units (GPUs), and Tensor Processing Units (TPUs), all of these were designed to speed up deep learning workloads, and the fact that this iteration of human-computer interaction is becoming more natural and social, it is clear that the field of human-computer interaction is poised for significant growth. The scientific community has found emotion recognition to be of tremendous interest and significance. Despite these advances, it is still desired that research into computational methods for identifying and recognizing emotions at the same ease as humans. This study uses Convolutional Neural Networks (CNN) for human emotion identification from facial expressions to delve deeper into this topic. The results demonstrated that training an Artificial Neural Networks (ANN) on GPUs might cut computational time by as much as 90% while accuracy could be raised up to 65%

    Adaptive 3D facial action intensity estimation and emotion recognition

    Get PDF
    Automatic recognition of facial emotion has been widely studied for various computer vision tasks (e.g. health monitoring, driver state surveillance and personalized learning). Most existing facial emotion recognition systems, however, either have not fully considered subject-independent dynamic features or were limited to 2D models, thus are not robust enough for real-life recognition tasks with subject variation, head movement and illumination change. Moreover, there is also lack of systematic research on effective newly arrived novel emotion class detection. To address these challenges, we present a real-time 3D facial Action Unit (AU) intensity estimation and emotion recognition system. It automatically selects 16 motion-based facial feature sets using minimal-redundancy–maximal-relevance criterion based optimization and estimates the intensities of 16 diagnostic AUs using feedforward Neural Networks and Support Vector Regressors. We also propose a set of six novel adaptive ensemble classifiers for robust classification of the six basic emotions and the detection of newly arrived unseen novel emotion classes (emotions that are not included in the training set). A distance-based clustering and uncertainty measures of the base classifiers within each ensemble model are used to inform the novel class detection. Evaluated with the Bosphorus 3D database, the system has achieved the best performance of 0.071 overall Mean Squared Error (MSE) for AU intensity estimation using Support Vector Regressors, and 92.2% average accuracy for the recognition of the six basic emotions using the proposed ensemble classifiers. In comparison with other related work, our research outperforms other state-of-the-art research on 3D facial emotion recognition for the Bosphorus database. Moreover, in on-line real-time evaluation with real human subjects, the proposed system also shows superior real-time performance with 84% recognition accuracy and great flexibility and adaptation for newly arrived novel (e.g. ‘contempt’ which is not included in the six basic emotions) emotion detection

    Automatisierte Erkennung und Evaluation von therapeutischen Übungen für Patienten mit Mimikdysfunktionen

    Get PDF
    In dieser Arbeit wird ein flexibles, kamerabasiertes Trainingssystem zur Rehabilitation von Gesichtslähmungen (Fazialisparesen) und anderen Mimikdysfunktionen vorgestellt. Das System unterstützt das selbstständige Training des Patienten, indem es die Durchführung von insgesamt zwölf Fazialisübungen automatisch bewertet und mehrstufiges Feedback an den Anwender vermittelt. Es eignet sich somit für einen begleitenden Einsatz zu den regulären Übungseinheiten, welche von einem Logopäden oder Sprechwissenschaftler angeleitet werden. Während Ansätze zur automatisierten Diagnose und Gradierung von Fazialisparesen in der Literatur vergleichsweise verbreitet sind, finden sich gegenwärtig nur vereinzelt Konzepte für therapiebegleitende Trainingsanwendungen. Die diesen Anwendungen zu Grunde liegenden Algorithmen sind zudem auf einzelne Fazialisübungen spezialisiert und daher, anders als das in dieser Arbeit vorgestellte System, nicht ohne Mehraufwand auf weitere Übungen übertragbar. Die Beiträge der vorliegenden Arbeit umfassen die wesentlichen Komponenten der technischen Gesamtarchitektur des Trainingssystems. Der methodische und experimentelle Fokus der Ausarbeitung liegt dabei vor allem auf der Merkmalsextraktion, sowie der Ableitung des Feedbacks aus den extrahierten Merkmalsdeskriptoren. Eine wesentliche Neuheit gegenüber dem Stand der Technik besteht in der Möglichkeit, das Trainingssystem flexibel um zusätzliche Fazialisübungen zu ergänzen und sowohl globales als auch regionenbezogenes Feedback bereitzustellen. Die dafür ausgewählten Verfahren basieren vorwiegend auf der Verarbeitung von 3D-Kameradaten und umfassen die Extraktion von Punktsignaturen, Histogrammen orientierter Normalenvektoren, sowie von Krümmungs-, Distanz- und Winkelmerkmalsdeskriptoren. Die Feedbackermittlung stützt sich auf den Einsatz von Random-Forests und den aus diesen ableitbaren paarweisen Ähnlichkeiten. Letztere stellen Schätzwerte für die merkmalsbezogene Übereinstimmung zwischen der vom Patienten ausgeführten Übung und den Modelldurchführungen in den Trainingsdaten dar.This thesis presents an automated, camera-based training system employable for the therapy of facial paralysis and related muscle dysfunctions. The proposed system aims to support patients in conducting twelve different facial exercises by providing automatically generated feedback. Thus, it is suited to supplement individual exercise sessions that are not supervised by a therapist. Automated grading and diagnosis systems for facial paralysis are a prominent topic in the literature on clinical image processing. In contrast, only few papers deal with the development of automated training systems for facial muscle re-education. Furthermore, the underlying algorithms are typically specialized for particular facial exercises and difficult to adapt to additional requirements. The contributions of this thesis comprise the main components of the system architecture with a methodical and technical emphasis on feature extraction algorithms and feedback estimation methods. Regarding the state-of-the-art, the major novelty is embodied in the possibility to easily extend the system to additional exercises and in the derivation of global and local feedback. The selected approaches rely on processing of 3D-camera data and include the extraction of point signatures, histograms of oriented normal vectors, curvatures, distance, and angle features. The feedback generation is based on random forest classifiers and proximities derived from trained forests. These proximities provide an estimate of similarity between the patient sample and training data samples
    corecore