732 research outputs found

    Affect recognition & generation in-the-wild

    Get PDF
    Affect recognition based on a subject’s facial expressions has been a topic of major research in the attempt to generate machines that can understand the way subjects feel, act and react. In the past, due to the unavailability of large amounts of data captured in real-life situations, research has mainly focused on controlled environments. However, recently, social media and platforms have been widely used. Moreover, deep learning has emerged as a means to solve visual analysis and recognition problems. This Ph.D. Thesis exploits these advances and makes significant contributions for affect analysis and recognition in-the-wild. We tackle affect analysis and recognition as a dual knowledge generation problem: i) we create new, large and rich in-the-wild databases and ii) we design and train novel deep neural architectures that are able to analyse affect over these databases and to successfully generalise their performance on other datasets. At first, we present the creation of Aff-Wild database annotated according to valence-arousal and an end-to-end CNN-RNN architecture, AffWildNet. Then we use AffWildNet as a robust prior for dimensional and categorical affect recognition and extend it by extracting low-/mid-/high-level latent information and analysing this via multiple RNNs. Additionally, we propose a novel loss function for DNN-based categorical affect recognition. Next, we generate Aff-Wild2, the first database containing annotations for all main behavior tasks: estimate Valence-Arousal; classify into Basic Expressions; detect Action Units. We develop multi-task and multi-modal extensions of AffWildNet by fusing these tasks and propose a novel holistic approach that utilises all existing databases with non-overlapping annotations and couples them through co-annotation and distribution matching. Finally, we present an approach for valence-arousal, or basic expressions’ facial affect synthesis. We generate an image with a given affect, or a sequence of images with evolving affect, by annotating a 4-D database and utilising a 3-D morphable model.Open Acces

    Recognising Complex Mental States from Naturalistic Human-Computer Interactions

    Get PDF
    New advances in computer vision techniques will revolutionize the way we interact with computers, as they, together with other improvements, will help us build machines that understand us better. The face is the main non-verbal channel for human-human communication and contains valuable information about emotion, mood, and mental state. Affective computing researchers have investigated widely how facial expressions can be used for automatically recognizing affect and mental states. Nowadays, physiological signals can be measured by video-based techniques, which can also be utilised for emotion detection. Physiological signals, are an important indicator of internal feelings, and are more robust against social masking. This thesis focuses on computer vision techniques to detect facial expression and physiological changes for recognizing non-basic and natural emotions during human-computer interaction. It covers all stages of the research process from data acquisition, integration and application. Most previous studies focused on acquiring data from prototypic basic emotions acted out under laboratory conditions. To evaluate the proposed method under more practical conditions, two different scenarios were used for data collection. In the first scenario, a set of controlled stimulus was used to trigger the user’s emotion. The second scenario aimed at capturing more naturalistic emotions that might occur during a writing activity. In the second scenario, the engagement level of the participants with other affective states was the target of the system. For the first time this thesis explores how video-based physiological measures can be used in affect detection. Video-based measuring of physiological signals is a new technique that needs more improvement to be used in practical applications. A machine learning approach is proposed and evaluated to improve the accuracy of heart rate (HR) measurement using an ordinary camera during a naturalistic interaction with computer

    Facial Analysis: Looking at Biometric Recognition and Genome-Wide Association

    Get PDF

    Automatic inference of latent emotion from spontaneous facial micro-expressions

    Get PDF
    Emotional states exert a profound influence on individuals' overall well-being, impacting them both physically and psychologically. Accurate recognition and comprehension of human emotions represent a crucial area of scientific exploration. Facial expressions, vocal cues, body language, and physiological responses provide valuable insights into an individual's emotional state, with facial expressions being universally recognised as dependable indicators of emotions. This thesis centres around three vital research aspects concerning the automated inference of latent emotions from spontaneous facial micro-expressions, seeking to enhance and refine our understanding of this complex domain. Firstly, the research aims to detect and analyse activated Action Units (AUs) during the occurrence of micro-expressions. AUs correspond to facial muscle movements. Although previous studies have established links between AUs and conventional facial expressions, no such connections have been explored for micro-expressions. Therefore, this thesis develops computer vision techniques to automatically detect activated AUs in micro-expressions, bridging a gap in existing studies. Secondly, the study explores the evolution of micro-expression recognition techniques, ranging from early handcrafted feature-based approaches to modern deep-learning methods. These approaches have significantly contributed to the field of automatic emotion recognition. However, existing methods primarily focus on capturing local spatial relationships, neglecting global relationships between different facial regions. To address this limitation, a novel third-generation architecture is proposed. This architecture can concurrently capture both short and long-range spatiotemporal relationships in micro-expression data, aiming to enhance the accuracy of automatic emotion recognition and improve our understanding of micro-expressions. Lastly, the thesis investigates the integration of multimodal signals to enhance emotion recognition accuracy. Depth information complements conventional RGB data by providing enhanced spatial features for analysis, while the integration of physiological signals with facial micro-expressions improves emotion discrimination. By incorporating multimodal data, the objective is to enhance machines' understanding of latent emotions and improve latent emotion recognition accuracy in spontaneous micro-expression analysis

    The Wits intelligent teaching system (WITS): a smart lecture theatre to assess audience engagement

    Get PDF
    A Thesis submitted to the Faculty of Science, University of the Witwatersrand, Johannesburg, in fulfilment of the requirements for the degree of Doctor of Philosophy, 2017The utility of lectures is directly related to the engagement of the students therein. To ensure the value of lectures, one needs to be certain that they are engaging to students. In small classes experienced lecturers develop an intuition of how engaged the class is as a whole and can then react appropriately to remedy the situation through various strategies such as breaks or changes in style, pace and content. As both the number of students and size of the venue grow, this type of contingent teaching becomes increasingly difficult and less precise. Furthermore, relying on intuition alone gives no way to recall and analyse previous classes or to objectively investigate trends over time. To address these problems this thesis presents the WITS INTELLIGENT TEACHING SYSTEM (WITS) to highlight disengaged students during class. A web-based, mobile application called Engage was developed to try elicit anonymous engagement information directly from students. The majority of students were unwilling or unable to self-report their engagement levels during class. This stems from a number of cultural and practical issues related to social display rules, unreliable internet connections, data costs, and distractions. This result highlights the need for a non-intrusive system that does not require the active participation of students. A nonintrusive, computer vision and machine learning based approach is therefore proposed. To support the development thereof, a labelled video dataset of students was built by recording a number of first year lectures. Students were labelled across a number of affects – including boredom, frustration, confusion, and fatigue – but poor inter-rater reliability meant that these labels could not be used as ground truth. Based on manual coding methods identified in the literature, a number of actions, gestures, and postures were identified as proxies of behavioural engagement. These proxies are then used in an observational checklist to mark students as engaged or not. A Support Vector Machine (SVM) was trained on Histograms of Oriented Gradients (HOG) to classify the students based on the identified behaviours. The results suggest a high temporal correlation of a single subject’s video frames. This leads to extremely high accuracies on seen subjects. However, this approach generalised poorly to unseen subjects and more careful feature engineering is required. The use of Convolutional Neural Networks (CNNs) improved the classification accuracy substantially, both over a single subject and when generalising to unseen subjects. While more computationally expensive than the SVM, the CNN approach lends itself to parallelism using Graphics Processing Units (GPUs). With GPU hardware acceleration, the system is able to run in near real-time and with further optimisations a real-time classifier is feasible. The classifier provides engagement values, which can be displayed to the lecturer live during class. This information is displayed as an Interest Map which highlights spatial areas of disengagement. The lecturer can then make informed decisions about how to progress with the class, what teaching styles to employ, and on which students to focus. An Interest Map was presented to lecturers and professors at the University of the Witwatersrand yielding 131 responses. The vast majority of respondents indicated that they would like to receive live engagement feedback during class, that they found the Interest Map an intuitive visualisation tool, and that they would be interested in using such technology. Contributions of this thesis include the development of a labelled video dataset; the development of a web based system to allow students to self-report engagement; the development of cross-platform, open-source software for spatial, action and affect labelling; the application of Histogram of Oriented Gradient based Support Vector Machines, and Deep Convolutional Neural Networks to classify this data; the development of an Interest Map to intuitively display engagement information to presenters; and finally an analysis of acceptance of such a system by educators.XL201

    Emotional expressions reconsidered: challenges to inferring emotion from human facial movements

    Get PDF
    It is commonly assumed that a person’s emotional state can be readily inferred from his or her facial movements, typically called emotional expressions or facial expressions. This assumption influences legal judgments, policy decisions, national security protocols, and educational practices; guides the diagnosis and treatment of psychiatric illness, as well as the development of commercial applications; and pervades everyday social interactions as well as research in other scientific fields such as artificial intelligence, neuroscience, and computer vision. In this article, we survey examples of this widespread assumption, which we refer to as the common view, and we then examine the scientific evidence that tests this view, focusing on the six most popular emotion categories used by consumers of emotion research: anger, disgust, fear, happiness, sadness, and surprise. The available scientific evidence suggests that people do sometimes smile when happy, frown when sad, scowl when angry, and so on, as proposed by the common view, more than what would be expected by chance. Yet how people communicate anger, disgust, fear, happiness, sadness, and surprise varies substantially across cultures, situations, and even across people within a single situation. Furthermore, similar configurations of facial movements variably express instances of more than one emotion category. In fact, a given configuration of facial movements, such as a scowl, often communicates something other than an emotional state. Scientists agree that facial movements convey a range of information and are important for social communication, emotional or otherwise. But our review suggests an urgent need for research that examines how people actually move their faces to express emotions and other social information in the variety of contexts that make up everyday life, as well as careful study of the mechanisms by which people perceive instances of emotion in one another. We make specific research recommendations that will yield a more valid picture of how people move their faces to express emotions and how they infer emotional meaning from facial movements in situations of everyday life. This research is crucial to provide consumers of emotion research with the translational information they require

    Application of Common Sense Computing for the Development of a Novel Knowledge-Based Opinion Mining Engine

    Get PDF
    The ways people express their opinions and sentiments have radically changed in the past few years thanks to the advent of social networks, web communities, blogs, wikis and other online collaborative media. The distillation of knowledge from this huge amount of unstructured information can be a key factor for marketers who want to create an image or identity in the minds of their customers for their product, brand, or organisation. These online social data, however, remain hardly accessible to computers, as they are specifically meant for human consumption. The automatic analysis of online opinions, in fact, involves a deep understanding of natural language text by machines, from which we are still very far. Hitherto, online information retrieval has been mainly based on algorithms relying on the textual representation of web-pages. Such algorithms are very good at retrieving texts, splitting them into parts, checking the spelling and counting their words. But when it comes to interpreting sentences and extracting meaningful information, their capabilities are known to be very limited. Existing approaches to opinion mining and sentiment analysis, in particular, can be grouped into three main categories: keyword spotting, in which text is classified into categories based on the presence of fairly unambiguous affect words; lexical affinity, which assigns arbitrary words a probabilistic affinity for a particular emotion; statistical methods, which calculate the valence of affective keywords and word co-occurrence frequencies on the base of a large training corpus. Early works aimed to classify entire documents as containing overall positive or negative polarity, or rating scores of reviews. Such systems were mainly based on supervised approaches relying on manually labelled samples, such as movie or product reviews where the opinionist’s overall positive or negative attitude was explicitly indicated. However, opinions and sentiments do not occur only at document level, nor they are limited to a single valence or target. Contrary or complementary attitudes toward the same topic or multiple topics can be present across the span of a document. In more recent works, text analysis granularity has been taken down to segment and sentence level, e.g., by using presence of opinion-bearing lexical items (single words or n-grams) to detect subjective sentences, or by exploiting association rule mining for a feature-based analysis of product reviews. These approaches, however, are still far from being able to infer the cognitive and affective information associated with natural language as they mainly rely on knowledge bases that are still too limited to efficiently process text at sentence level. In this thesis, common sense computing techniques are further developed and applied to bridge the semantic gap between word-level natural language data and the concept-level opinions conveyed by these. In particular, the ensemble application of graph mining and multi-dimensionality reduction techniques on two common sense knowledge bases was exploited to develop a novel intelligent engine for open-domain opinion mining and sentiment analysis. The proposed approach, termed sentic computing, performs a clause-level semantic analysis of text, which allows the inference of both the conceptual and emotional information associated with natural language opinions and, hence, a more efficient passage from (unstructured) textual information to (structured) machine-processable data. The engine was tested on three different resources, namely a Twitter hashtag repository, a LiveJournal database and a PatientOpinion dataset, and its performance compared both with results obtained using standard sentiment analysis techniques and using different state-of-the-art knowledge bases such as Princeton’s WordNet, MIT’s ConceptNet and Microsoft’s Probase. Differently from most currently available opinion mining services, the developed engine does not base its analysis on a limited set of affect words and their co-occurrence frequencies, but rather on common sense concepts and the cognitive and affective valence conveyed by these. This allows the engine to be domain-independent and, hence, to be embedded in any opinion mining system for the development of intelligent applications in multiple fields such as Social Web, HCI and e-health. Looking ahead, the combined novel use of different knowledge bases and of common sense reasoning techniques for opinion mining proposed in this work, will, eventually, pave the way for development of more bio-inspired approaches to the design of natural language processing systems capable of handling knowledge, retrieving it when necessary, making analogies and learning from experience
    corecore