91 research outputs found

    Affect Recognition Using Electroencephalography Features

    Get PDF
    Affect is the psychological display of emotion often described with three principal dimensions: 1) valence 2) arousal and 3) dominance. This thesis work explores the ability of computers to recognize human emotions using Electroencephalography (EEG) features. The development of computer systems to classify human emotions using physiological signals has recently gained pace in the research and technological community. This is because by using EEG to analyze the cognitive state one will be able to establish a direct communication channel between a computer and the human brain. Other applications of recognizing the affective states from EEG include identifying stress and cognitive workload on individuals and assist them in relaxation. This thesis is an extensive study on the design of paradigms that help computer systems recognize emotional states given a multichannel Electroencephalogram (EEG) segment. The process of first extracting features from the EEG signals using signal processing and then constructing a predictive model via machine learning is often referred to as paradigms. In this work, we will first present a brief review of the state-of-the-art paradigms that have contributed to the topic of emotional affect recognition. Then the proposed paradigms to recognize the principal dimensions of affect are detailed. Feature selection is also performed in order to select the relevant features. The evaluation of the models created to predict the affective states will be performed quantitatively by calculating the generalization accuracy and qualitatively by interpreting them

    On automatic emotion classification using acoustic features

    No full text
    In this thesis, we describe extensive experiments on the classification of emotions from speech using acoustic features. This area of research has important applications in human computer interaction. We have thoroughly reviewed the current literature and present our results on some of the contemporary emotional speech databases. The principal focus is on creating a large set of acoustic features, descriptive of different emotional states and finding methods for selecting a subset of best performing features by using feature selection methods. In this thesis we have looked at several traditional feature selection methods and propose a novel scheme which employs a preferential Borda voting strategy for ranking features. The comparative results show that our proposed scheme can strike a balance between accurate but computationally intensive wrapper methods and less accurate but computationally less intensive filter methods for feature selection. By using the selected features, several schemes for extending the binary classifiers to multiclass classification are tested. Some of these classifiers form serial combinations of binary classifiers while others use a hierarchical structure to perform this task. We describe a new hierarchical classification scheme, which we call Data-Driven Dimensional Emotion Classification (3DEC), whose decision hierarchy is based on non-metric multidimensional scaling (NMDS) of the data. This method of creating a hierarchical structure for the classification of emotion classes gives significant improvements over other methods tested. The NMDS representation of emotional speech data can be interpreted in terms of the well-known valence-arousal model of emotion. We find that this model does not givea particularly good fit to the data: although the arousal dimension can be identified easily, valence is not well represented in the transformed data. From the recognitionresults on these two dimensions, we conclude that valence and arousal dimensions are not orthogonal to each other. In the last part of this thesis, we deal with the very difficult but important topic of improving the generalisation capabilities of speech emotion recognition (SER) systems over different speakers and recording environments. This topic has been generally overlooked in the current research in this area. First we try the traditional methods used in automatic speech recognition (ASR) systems for improving the generalisation of SER in intra– and inter–database emotion classification. These traditional methods do improve the average accuracy of the emotion classifier. In this thesis, we identify these differences in the training and test data, due to speakers and acoustic environments, as a covariate shift. This shift is minimised by using importance weighting algorithms from the emerging field of transfer learning to guide the learning algorithm towards that training data which gives better representation of testing data. Our results show that importance weighting algorithms can be used to minimise the differences between the training and testing data. We also test the effectiveness of importance weighting algorithms on inter–database and cross-lingual emotion recognition. From these results, we draw conclusions about the universal nature of emotions across different languages

    Timing is everything: A spatio-temporal approach to the analysis of facial actions

    No full text
    This thesis presents a fully automatic facial expression analysis system based on the Facial Action Coding System (FACS). FACS is the best known and the most commonly used system to describe facial activity in terms of facial muscle actions (i.e., action units, AUs). We will present our research on the analysis of the morphological, spatio-temporal and behavioural aspects of facial expressions. In contrast with most other researchers in the field who use appearance based techniques, we use a geometric feature based approach. We will argue that that approach is more suitable for analysing facial expression temporal dynamics. Our system is capable of explicitly exploring the temporal aspects of facial expressions from an input colour video in terms of their onset (start), apex (peak) and offset (end). The fully automatic system presented here detects 20 facial points in the first frame and tracks them throughout the video. From the tracked points we compute geometry-based features which serve as the input to the remainder of our systems. The AU activation detection system uses GentleBoost feature selection and a Support Vector Machine (SVM) classifier to find which AUs were present in an expression. Temporal dynamics of active AUs are recognised by a hybrid GentleBoost-SVM-Hidden Markov model classifier. The system is capable of analysing 23 out of 27 existing AUs with high accuracy. The main contributions of the work presented in this thesis are the following: we have created a method for fully automatic AU analysis with state-of-the-art recognition results. We have proposed for the first time a method for recognition of the four temporal phases of an AU. We have build the largest comprehensive database of facial expressions to date. We also present for the first time in the literature two studies for automatic distinction between posed and spontaneous expressions

    Efficient Learning Machines

    Get PDF
    Computer scienc

    Proceedings of the 18th Irish Conference on Artificial Intelligence and Cognitive Science

    Get PDF
    These proceedings contain the papers that were accepted for publication at AICS-2007, the 18th Annual Conference on Artificial Intelligence and Cognitive Science, which was held in the Technological University Dublin; Dublin, Ireland; on the 29th to the 31st August 2007. AICS is the annual conference of the Artificial Intelligence Association of Ireland (AIAI)

    Bag-of-words representations for computer audition

    Get PDF
    Computer audition is omnipresent in everyday life, in applications ranging from personalised virtual agents to health care. From a technical point of view, the goal is to robustly classify the content of an audio signal in terms of a defined set of labels, such as, e.g., the acoustic scene, a medical diagnosis, or, in the case of speech, what is said or how it is said. Typical approaches employ machine learning (ML), which means that task-specific models are trained by means of examples. Despite recent successes in neural network-based end-to-end learning, taking the raw audio signal as input, models relying on hand-crafted acoustic features are still superior in some domains, especially for tasks where data is scarce. One major issue is nevertheless that a sequence of acoustic low-level descriptors (LLDs) cannot be fed directly into many ML algorithms as they require a static and fixed-length input. Moreover, also for dynamic classifiers, compressing the information of the LLDs over a temporal block by summarising them can be beneficial. However, the type of instance-level representation has a fundamental impact on the performance of the model. In this thesis, the so-called bag-of-audio-words (BoAW) representation is investigated as an alternative to the standard approach of statistical functionals. BoAW is an unsupervised method of representation learning, inspired from the bag-of-words method in natural language processing, forming a histogram of the terms present in a document. The toolkit openXBOW is introduced, enabling systematic learning and optimisation of these feature representations, unified across arbitrary modalities of numeric or symbolic descriptors. A number of experiments on BoAW are presented and discussed, focussing on a large number of potential applications and corresponding databases, ranging from emotion recognition in speech to medical diagnosis. The evaluations include a comparison of different acoustic LLD sets and configurations of the BoAW generation process. The key findings are that BoAW features are a meaningful alternative to statistical functionals, offering certain benefits, while being able to preserve the advantages of functionals, such as data-independence. Furthermore, it is shown that both representations are complementary and their fusion improves the performance of a machine listening system.Maschinelles Hören ist im tĂ€glichen Leben allgegenwĂ€rtig, mit Anwendungen, die von personalisierten virtuellen Agenten bis hin zum Gesundheitswesen reichen. Aus technischer Sicht besteht das Ziel darin, den Inhalt eines Audiosignals hinsichtlich einer Auswahl definierter Labels robust zu klassifizieren. Die Labels beschreiben bspw. die akustische Umgebung der Aufnahme, eine medizinische Diagnose oder - im Falle von Sprache - was gesagt wird oder wie es gesagt wird. Übliche AnsĂ€tze hierzu verwenden maschinelles Lernen, d.h., es werden anwendungsspezifische Modelle anhand von Beispieldaten trainiert. Trotz jĂŒngster Erfolge beim Ende-zu-Ende-Lernen mittels neuronaler Netze, in welchen das unverarbeitete Audiosignal als Eingabe benutzt wird, sind Modelle, die auf definierten akustischen Merkmalen basieren, in manchen Bereichen weiterhin ĂŒberlegen. Dies gilt im Besonderen fĂŒr Einsatzzwecke, fĂŒr die nur wenige Daten vorhanden sind. Allerdings besteht dabei das Problem, dass Zeitfolgen von akustischen Deskriptoren in viele Algorithmen des maschinellen Lernens nicht direkt eingespeist werden können, da diese eine statische Eingabe fester LĂ€nge benötigen. Außerdem kann es auch fĂŒr dynamische (zeitabhĂ€ngige) Klassifikatoren vorteilhaft sein, die Deskriptoren ĂŒber ein gewisses Zeitintervall zusammenzufassen. Jedoch hat die Art der Merkmalsdarstellung einen grundlegenden Einfluss auf die LeistungsfĂ€higkeit des Modells. In der vorliegenden Dissertation wird der sogenannte Bag-of-Audio-Words-Ansatz (BoAW) als Alternative zum Standardansatz der statistischen Funktionale untersucht. BoAW ist eine Methode des unĂŒberwachten Lernens von Merkmalsdarstellungen, die von der Bag-of-Words-Methode in der Computerlinguistik inspiriert wurde, bei der ein Textdokument als Histogramm der vorkommenden Wörter beschrieben wird. Das Toolkit openXBOW wird vorgestellt, welches systematisches Training und Optimierung dieser Merkmalsdarstellungen - vereinheitlicht fĂŒr beliebige ModalitĂ€ten mit numerischen oder symbolischen Deskriptoren - erlaubt. Es werden einige Experimente zum BoAW-Ansatz durchgefĂŒhrt und diskutiert, die sich auf eine große Zahl möglicher Anwendungen und entsprechende DatensĂ€tze beziehen, von der Emotionserkennung in gesprochener Sprache bis zur medizinischen Diagnostik. Die Auswertungen beinhalten einen Vergleich verschiedener akustischer Deskriptoren und Konfigurationen der BoAW-Methode. Die wichtigsten Erkenntnisse sind, dass BoAW-Merkmalsvektoren eine geeignete Alternative zu statistischen Funktionalen darstellen, gewisse VorzĂŒge bieten und gleichzeitig wichtige Eigenschaften der Funktionale, wie bspw. die DatenunabhĂ€ngigkeit, erhalten können. Zudem wird gezeigt, dass beide Darstellungen komplementĂ€r sind und eine Fusionierung die LeistungsfĂ€higkeit eines Systems des maschinellen Hörens verbessert

    Biological and biomimetic machine learning for automatic classification of human gait

    Get PDF
    Machine learning (ML) research has benefited from a deep understanding of biological mechanisms that have evolved to perform comparable tasks. Recent successes of ML models, superseding human performance in human perception based tasks has garnered interest in improving them further. However, the approach to improving ML models tends to be unstructured, particularly for the models that aim to mimic biology. This thesis proposes and applies a bidirectional learning paradigm to streamline the process of improving ML models’ performance in classification of a task, which humans are already adept at. The approach is validated taking human gait classification as the exemplar task. This paradigm possesses the additional benefit of investigating underlying mechanisms in human perception (HP) using the ML models. Assessment of several biomimetic (BM) and non-biomimetic (NBM) machine learning models on an intrinsic feature of gait, namely the gender of the walker, establishes a functional overlap in the perception of gait between HP and BM, selecting the Long-Short-Term-Memory (LSTM) architecture as the BM of choice for this study, when compared with other models such as support vector machines, decision trees and multi-layer perceptron models. Psychophysics and computational experiments are conducted to understand the overlap between human and machine models. The BM and HP derived from psychophysics experiments, share qualitatively similar profiles of gender classification accuracy across varying stimulus exposure durations. They also share the preference for motion-based cues over structural cues (BM=H>NBM). Further evaluation reveals a human-like expression of the inversion effect, a well-studied cognitive bias in HP that reduces the gender classification accuracy to 37% (p<0.05, chance at 50%) when exposed to inverted stimulus. Its expression in the BM supports the argument for learned rather than hard-wired mechanisms in HP. Particularly given the emergence of the effect in every BM, after training multiple randomly initialised BM models without prior anthropomorphic expectations of gait. The above aspects of HP, namely the preference for motion cues over structural cues and the lack of prior anthropomorphic expectations, were selected to improve BM performance. Representing gait explicitly as motion-based cues of a non-anthropomorphic, gender-neutral skeleton not only mitigates the inversion effect in BM, but also improves significantly the classification accuracy. In the case of gender classification of upright stimuli, mean accuracy improved by 6%, from 76% to 82% (F1,18 = 16, p<0.05). For inverted stimuli, mean accuracy improved by 45%, from 37% to 82% (F1,18 = 20, p<0.05). The model was further tested on a more challenging, extrinsic feature task; the classification of the emotional state of a walker. Emotions were visually induced in subjects through exposure to emotive or neutral images from the International Affective Picture System (IAPS) database. The classification accuracy of the BM was significantly above chance at 43% accuracy (p<0.05, chance at 33.3%). However, application of the proposed paradigm in further binary emotive state classification experiments, improved mean accuracy further by 23%, from 43% to 65% (F1,18 = 7.4, p<0.05) for the positive vs. neutral task. Results validate the proposed paradigm of concurrent bidirectional investigation of HP and BM for the classification of human gait, suggesting future applications for automating perceptual tasks for which the human brain and body has evolved
    • 

    corecore