108 research outputs found

    Combining local descriptors and classification methods for human emotion recognition.

    Get PDF
    Masters Degree. University of KwaZulu-Natal, Durban.Human Emotion Recognition occupies a very important place in artificial intelligence and has several applications, such as emotionally intelligent robots, driver fatigue monitoring, mood prediction, and many others. Facial Expression Recognition (FER) systems can recognize human emotions by extracting face image features and classifying them as one of several prototypic emotions. Local descriptors are good at encoding micro-patterns and capturing their distribution in a sub-region of an image. Moreover, dividing the face into sub-regions introduces information about micro-pattern locations, essential for developing robust facial expression features. Hence, local descriptors’ efficiencies depend heavily on parameters such as the sub-region size and histogram length. However, the extraction parameters are seldom optimized in existing approaches. This dissertation reviews several local descriptors and classifiers, and experiments are conducted to improve the robustness and accuracy of existing FER methods. A study of the Histogram of Oriented Gradients (HOG) descriptor inspires this research to propose a new face registration algorithm. The approach uses contrast-limited histogram equalization to enhance the image, followed by binary thresholding and blob detection operations to rotate the face upright. Additionally, this research proposes a new method for optimized FER. The main idea behind the approach is to optimize the calculation of feature vectors by varying the extraction parameter values, producing several feature sets. The best extraction parameter values are selected by evaluating the classification performances of each feature set. The proposed approach is also implemented using different combinations of local descriptors and classification methods under the same experimental conditions. The results reveal that the proposed methods produced a better performance than what was reported in previous studies. Furthermore, the results showed an improvement of up to 2% compared with the performance achieved in previous works. The results showed that HOG was the most effective local descriptor, while Support Vector Machines (SVM) and Multi-Layer Perceptron (MLP) were the best classifiers. Hence, the best combinations were HOG+SVM and HOG+MLP

    Logging Stress and Anxiety Using a Gamified Mobile-based EMA Application, and Emotion Recognition Using a Personalized Machine Learning Approach

    Get PDF
    According to American Psychological Association (APA) more than 9 in 10 (94 percent) adults believe that stress can contribute to the development of major health problems, such as heart disease, depression, and obesity. Due to the subjective nature of stress, and anxiety, it has been demanding to measure these psychological issues accurately by only relying on objective means. In recent years, researchers have increasingly utilized computer vision techniques and machine learning algorithms to develop scalable and accessible solutions for remote mental health monitoring via web and mobile applications. To further enhance accuracy in the field of digital health and precision diagnostics, there is a need for personalized machine-learning approaches that focus on recognizing mental states based on individual characteristics, rather than relying solely on general-purpose solutions. This thesis focuses on conducting experiments aimed at recognizing and assessing levels of stress and anxiety in participants. In the initial phase of the study, a mobile application with broad applicability (compatible with both Android and iPhone platforms) is introduced (we called it STAND). This application serves the purpose of Ecological Momentary Assessment (EMA). Participants receive daily notifications through this smartphone-based app, which redirects them to a screen consisting of three components. These components include a question that prompts participants to indicate their current levels of stress and anxiety, a rating scale ranging from 1 to 10 for quantifying their response, and the ability to capture a selfie. The responses to the stress and anxiety questions, along with the corresponding selfie photographs, are then analyzed on an individual basis. This analysis focuses on exploring the relationships between self-reported stress and anxiety levels and potential facial expressions indicative of stress and anxiety, eye features such as pupil size variation and eye closure, and specific action units (AUs) observed in the frames over time. In addition to its primary functions, the mobile app also gathers sensor data, including accelerometer and gyroscope readings, on a daily basis. This data holds potential for further analysis related to stress and anxiety. Furthermore, apart from capturing selfie photographs, participants have the option to upload video recordings of themselves while engaging in two neuropsychological games. These recorded videos are then subjected to analysis in order to extract pertinent features that can be utilized for binary classification of stress and anxiety (i.e., stress and anxiety recognition). The participants that will be selected for this phase are students aged between 18 and 38, who have received recent clinical diagnoses indicating specific stress and anxiety levels. In order to enhance user engagement in the intervention, gamified elements - an emerging trend to influence user behavior and lifestyle - has been utilized. Incorporating gamified elements into non-game contexts (e.g., health-related) has gained overwhelming popularity during the last few years which has made the interventions more delightful, engaging, and motivating. In the subsequent phase of this research, we conducted an AI experiment employing a personalized machine learning approach to perform emotion recognition on an established dataset called Emognition. This experiment served as a simulation of the future analysis that will be conducted as part of a more comprehensive study focusing on stress and anxiety recognition. The outcomes of the emotion recognition experiment in this study highlight the effectiveness of personalized machine learning techniques and bear significance for the development of future diagnostic endeavors. For training purposes, we selected three models, namely KNN, Random Forest, and MLP. The preliminary performance accuracy results for the experiment were 93%, 95%, and 87% respectively for these models

    感性推定のためのDeep Learning による特徴抽出

    Get PDF
    広島大学(Hiroshima University)博士(工学)Doctor of Engineeringdoctora

    The Wits intelligent teaching system (WITS): a smart lecture theatre to assess audience engagement

    Get PDF
    A Thesis submitted to the Faculty of Science, University of the Witwatersrand, Johannesburg, in fulfilment of the requirements for the degree of Doctor of Philosophy, 2017The utility of lectures is directly related to the engagement of the students therein. To ensure the value of lectures, one needs to be certain that they are engaging to students. In small classes experienced lecturers develop an intuition of how engaged the class is as a whole and can then react appropriately to remedy the situation through various strategies such as breaks or changes in style, pace and content. As both the number of students and size of the venue grow, this type of contingent teaching becomes increasingly difficult and less precise. Furthermore, relying on intuition alone gives no way to recall and analyse previous classes or to objectively investigate trends over time. To address these problems this thesis presents the WITS INTELLIGENT TEACHING SYSTEM (WITS) to highlight disengaged students during class. A web-based, mobile application called Engage was developed to try elicit anonymous engagement information directly from students. The majority of students were unwilling or unable to self-report their engagement levels during class. This stems from a number of cultural and practical issues related to social display rules, unreliable internet connections, data costs, and distractions. This result highlights the need for a non-intrusive system that does not require the active participation of students. A nonintrusive, computer vision and machine learning based approach is therefore proposed. To support the development thereof, a labelled video dataset of students was built by recording a number of first year lectures. Students were labelled across a number of affects – including boredom, frustration, confusion, and fatigue – but poor inter-rater reliability meant that these labels could not be used as ground truth. Based on manual coding methods identified in the literature, a number of actions, gestures, and postures were identified as proxies of behavioural engagement. These proxies are then used in an observational checklist to mark students as engaged or not. A Support Vector Machine (SVM) was trained on Histograms of Oriented Gradients (HOG) to classify the students based on the identified behaviours. The results suggest a high temporal correlation of a single subject’s video frames. This leads to extremely high accuracies on seen subjects. However, this approach generalised poorly to unseen subjects and more careful feature engineering is required. The use of Convolutional Neural Networks (CNNs) improved the classification accuracy substantially, both over a single subject and when generalising to unseen subjects. While more computationally expensive than the SVM, the CNN approach lends itself to parallelism using Graphics Processing Units (GPUs). With GPU hardware acceleration, the system is able to run in near real-time and with further optimisations a real-time classifier is feasible. The classifier provides engagement values, which can be displayed to the lecturer live during class. This information is displayed as an Interest Map which highlights spatial areas of disengagement. The lecturer can then make informed decisions about how to progress with the class, what teaching styles to employ, and on which students to focus. An Interest Map was presented to lecturers and professors at the University of the Witwatersrand yielding 131 responses. The vast majority of respondents indicated that they would like to receive live engagement feedback during class, that they found the Interest Map an intuitive visualisation tool, and that they would be interested in using such technology. Contributions of this thesis include the development of a labelled video dataset; the development of a web based system to allow students to self-report engagement; the development of cross-platform, open-source software for spatial, action and affect labelling; the application of Histogram of Oriented Gradient based Support Vector Machines, and Deep Convolutional Neural Networks to classify this data; the development of an Interest Map to intuitively display engagement information to presenters; and finally an analysis of acceptance of such a system by educators.XL201

    Sensor Technologies to Manage the Physiological Traits of Chronic Pain: A Review

    Get PDF
    Non-oncologic chronic pain is a common high-morbidity impairment worldwide and acknowledged as a condition with significant incidence on quality of life. Pain intensity is largely perceived as a subjective experience, what makes challenging its objective measurement. However, the physiological traces of pain make possible its correlation with vital signs, such as heart rate variability, skin conductance, electromyogram, etc., or health performance metrics derived from daily activity monitoring or facial expressions, which can be acquired with diverse sensor technologies and multisensory approaches. As the assessment and management of pain are essential issues for a wide range of clinical disorders and treatments, this paper reviews different sensor-based approaches applied to the objective evaluation of non-oncological chronic pain. The space of available technologies and resources aimed at pain assessment represent a diversified set of alternatives that can be exploited to address the multidimensional nature of pain.Ministerio de Economía y Competitividad (Instituto de Salud Carlos III) PI15/00306Junta de Andalucía PIN-0394-2017Unión Europea "FRAIL

    Enhanced contextual based deep learning model for niqab face detection

    Get PDF
    Human face detection is one of the most investigated areas in computer vision which plays a fundamental role as the first step for all face processing and facial analysis systems, such as face recognition, security monitoring, and facial emotion recognition. Despite the great impact of Deep Learning Convolutional neural network (DL-CNN) approaches on solving many unconstrained face detection problems in recent years, the low performance of current face detection models when detecting highly occluded faces remains a challenging problem and worth of investigation. This challenge tends to be higher when the occlusion covers most of the face which dramatically reduce the number of learned representative features that are used by Feature Extraction Network (FEN) to discriminate face parts from the background. The lack of occluded face dataset with sufficient images for heavily occluded faces is another challenge that degrades the performance. Therefore, this research addressed the issue of low performance and developed an enhanced occluded face detection model for detecting and localizing heavily occluded faces. First, a highly occluded faces dataset was developed to provide sufficient training examples incorporated with contextual-based annotation technique, to maximize the amount of facial salient features. Second, using the training half of the dataset, a deep learning-CNN Occluded Face Detection model (OFD) with an enhanced feature extraction and detection network was proposed and trained. Common deep learning techniques, namely transfer learning and data augmentation techniques were used to speed up the training process. The false-positive reduction based on max-in-out strategy was adopted to reduce the high false-positive rate. The proposed model was evaluated and benchmarked with five current face detection models on the dataset. The obtained results show that OFD achieved improved performance in terms of accuracy (average 37%), and average precision (16.6%) compared to current face detection models. The findings revealed that the proposed model outperformed current face detection models in improving the detection of highly occluded faces. Based on the findings, an improved contextual based labeling technique has been successfully developed to address the insufficient functionalities of current labeling technique. Faculty of Engineering - School of Computing183http://dms.library.utm.my:8080/vital/access/manager/Repository/vital:150777 Deep Learning Convolutional neural network (DL-CNN), Feature Extraction Network (FEN), Occluded Face Detection model (OFD

    Time- and value-continuous explainable affect estimation in-the-wild

    Get PDF
    Today, the relevance of Affective Computing, i.e., of making computers recognise and simulate human emotions, cannot be overstated. All technology giants (from manufacturers of laptops to mobile phones to smart speakers) are in a fierce competition to make their devices understand not only what is being said, but also how it is being said to recognise user’s emotions. The goals have evolved from predicting the basic emotions (e.g., happy, sad) to now the more nuanced affective states (e.g., relaxed, bored) real-time. The databases used in such research too have evolved, from earlier featuring the acted behaviours to now spontaneous behaviours. There is a more powerful shift lately, called in-the-wild affect recognition, i.e., taking the research out of the laboratory, into the uncontrolled real-world. This thesis discusses, for the very first time, affect recognition for two unique in-the-wild audiovisual databases, GRAS2 and SEWA. The GRAS2 is the only database till date with time- and value-continuous affect annotations for Labov effect-free affective behaviours, i.e., without the participant’s awareness of being recorded (which otherwise is known to affect the naturalness of one’s affective behaviour). The SEWA features participants from six different cultural backgrounds, conversing using a video-calling platform. Thus, SEWA features in-the-wild recordings further corrupted by unpredictable artifacts, such as the network-induced delays, frame-freezing and echoes. The two databases present a unique opportunity to study time- and value-continuous affect estimation that is truly in-the-wild. A novel ‘Evaluator Weighted Estimation’ formulation is proposed to generate a gold standard sequence from several annotations. An illustration is presented demonstrating that the moving bag-of-words (BoW) representation better preserves the temporal context of the features, yet remaining more robust against the outliers compared to other statistical summaries, e.g., moving average. A novel, data-independent randomised codebook is proposed for the BoW representation; especially useful for cross-corpus model generalisation testing when the feature-spaces of the databases differ drastically. Various deep learning models and support vector regressors are used to predict affect dimensions time- and value-continuously. Better generalisability of the models trained on GRAS2 , despite the smaller training size, makes a strong case for the collection and use of Labov effect-free data. A further foundational contribution is the discovery of the missing many-to-many mapping between the mean square error (MSE) and the concordance correlation coefficient (CCC), i.e., between two of the most popular utility functions till date. The newly invented cost function |MSE_{XY}/σ_{XY}| has been evaluated in the experiments aimed at demystifying the inner workings of a well-performing, simple, low-cost neural network effectively utilising the BoW text features. Also proposed herein is the shallowest-possible convolutional neural network (CNN) that uses the facial action unit (FAU) features. The CNN exploits sequential context, but unlike RNNs, also inherently allows data- and process-parallelism. Interestingly, for the most part, these white-box AI models have shown to utilise the provided features consistent with the human perception of emotion expression
    corecore