1,694 research outputs found

    Recognition of emotions using Kinects

    Get PDF
    Abstract Emotion recognition can improve the quality of patient care, product development and human-machine interaction. Psychological studies indicate that emotional state can be expressed in the way people walk, and the human gait can be used to reveal a person's emotional state. This paper proposes a novel method to do emotion recognition by using Microsoft Kinect to record gait patterns and train machine learning algorithms for emotion recognition. 59 subjects are recruited, and their gait patterns are recorded by two Kinect cameras. Joint selection, coordinate system transformation, sliding window gauss filtering, differential operation, and data segmentation are used for data preprocessing. We run Fourier transformation to extract features from the gait patterns and utilize Principal Component Analysis(PCA) for feature selection. By using NaiveBayes, RandomForests, LibSVM and SMO classifiers, the accuracy of recognition between natural and angry emotions can reach 80%, and the accuracy of recognition between natural and happy emotions can reach above 70%. The result indicates that Kinect can be used in the recognition of emotions with fairly well performance

    Emotion and Stress Recognition Related Sensors and Machine Learning Technologies

    Get PDF
    This book includes impactful chapters which present scientific concepts, frameworks, architectures and ideas on sensing technologies and machine learning techniques. These are relevant in tackling the following challenges: (i) the field readiness and use of intrusive sensor systems and devices for capturing biosignals, including EEG sensor systems, ECG sensor systems and electrodermal activity sensor systems; (ii) the quality assessment and management of sensor data; (iii) data preprocessing, noise filtering and calibration concepts for biosignals; (iv) the field readiness and use of nonintrusive sensor technologies, including visual sensors, acoustic sensors, vibration sensors and piezoelectric sensors; (v) emotion recognition using mobile phones and smartwatches; (vi) body area sensor networks for emotion and stress studies; (vii) the use of experimental datasets in emotion recognition, including dataset generation principles and concepts, quality insurance and emotion elicitation material and concepts; (viii) machine learning techniques for robust emotion recognition, including graphical models, neural network methods, deep learning methods, statistical learning and multivariate empirical mode decomposition; (ix) subject-independent emotion and stress recognition concepts and systems, including facial expression-based systems, speech-based systems, EEG-based systems, ECG-based systems, electrodermal activity-based systems, multimodal recognition systems and sensor fusion concepts and (x) emotion and stress estimation and forecasting from a nonlinear dynamical system perspective

    Multimodaalsel emotsioonide tuvastamisel põhineva inimese-roboti suhtluse arendamine

    Get PDF
    Väitekirja elektrooniline versioon ei sisalda publikatsiooneÜks afektiivse arvutiteaduse peamistest huviobjektidest on mitmemodaalne emotsioonituvastus, mis leiab rakendust peamiselt inimese-arvuti interaktsioonis. Emotsiooni äratundmiseks uuritakse nendes süsteemides nii inimese näoilmeid kui kakõnet. Käesolevas töös uuritakse inimese emotsioonide ja nende avaldumise visuaalseid ja akustilisi tunnuseid, et töötada välja automaatne multimodaalne emotsioonituvastussüsteem. Kõnest arvutatakse mel-sageduse kepstri kordajad, helisignaali erinevate komponentide energiad ja prosoodilised näitajad. Näoilmeteanalüüsimiseks kasutatakse kahte erinevat strateegiat. Esiteks arvutatakse inimesenäo tähtsamate punktide vahelised erinevad geomeetrilised suhted. Teiseks võetakse emotsionaalse sisuga video kokku vähendatud hulgaks põhikaadriteks, misantakse sisendiks konvolutsioonilisele tehisnärvivõrgule emotsioonide visuaalsekseristamiseks. Kolme klassifitseerija väljunditest (1 akustiline, 2 visuaalset) koostatakse uus kogum tunnuseid, mida kasutatakse õppimiseks süsteemi viimasesetapis. Loodud süsteemi katsetati SAVEE, Poola ja Serbia emotsionaalse kõneandmebaaside, eNTERFACE’05 ja RML andmebaaside peal. Saadud tulemusednäitavad, et võrreldes olemasolevatega võimaldab käesoleva töö raames loodudsüsteem suuremat täpsust emotsioonide äratundmisel. Lisaks anname käesolevastöös ülevaate kirjanduses väljapakutud süsteemidest, millel on võimekus tunda äraemotsiooniga seotud ̆zeste. Selle ülevaate eesmärgiks on hõlbustada uute uurimissuundade leidmist, mis aitaksid lisada töö raames loodud süsteemile ̆zestipõhiseemotsioonituvastuse võimekuse, et veelgi enam tõsta süsteemi emotsioonide äratundmise täpsust.Automatic multimodal emotion recognition is a fundamental subject of interest in affective computing. Its main applications are in human-computer interaction. The systems developed for the foregoing purpose consider combinations of different modalities, based on vocal and visual cues. This thesis takes the foregoing modalities into account, in order to develop an automatic multimodal emotion recognition system. More specifically, it takes advantage of the information extracted from speech and face signals. From speech signals, Mel-frequency cepstral coefficients, filter-bank energies and prosodic features are extracted. Moreover, two different strategies are considered for analyzing the facial data. First, facial landmarks' geometric relations, i.e. distances and angles, are computed. Second, we summarize each emotional video into a reduced set of key-frames. Then they are taught to visually discriminate between the emotions. In order to do so, a convolutional neural network is applied to the key-frames summarizing the videos. Afterward, the output confidence values of all the classifiers from both of the modalities are used to define a new feature space. Lastly, the latter values are learned for the final emotion label prediction, in a late fusion. The experiments are conducted on the SAVEE, Polish, Serbian, eNTERFACE'05 and RML datasets. The results show significant performance improvements by the proposed system in comparison to the existing alternatives, defining the current state-of-the-art on all the datasets. Additionally, we provide a review of emotional body gesture recognition systems proposed in the literature. The aim of the foregoing part is to help figure out possible future research directions for enhancing the performance of the proposed system. More clearly, we imply that incorporating data representing gestures, which constitute another major component of the visual modality, can result in a more efficient framework

    Recognition of emotions using Kinects

    Get PDF

    Machine learning for automatic analysis of affective behaviour

    Get PDF
    The automated analysis of affect has been gaining rapidly increasing attention by researchers over the past two decades, as it constitutes a fundamental step towards achieving next-generation computing technologies and integrating them into everyday life (e.g. via affect-aware, user-adaptive interfaces, medical imaging, health assessment, ambient intelligence etc.). The work presented in this thesis focuses on several fundamental problems manifesting in the course towards the achievement of reliable, accurate and robust affect sensing systems. In more detail, the motivation behind this work lies in recent developments in the field, namely (i) the creation of large, audiovisual databases for affect analysis in the so-called ''Big-Data`` era, along with (ii) the need to deploy systems under demanding, real-world conditions. These developments led to the requirement for the analysis of emotion expressions continuously in time, instead of merely processing static images, thus unveiling the wide range of temporal dynamics related to human behaviour to researchers. The latter entails another deviation from the traditional line of research in the field: instead of focusing on predicting posed, discrete basic emotions (happiness, surprise etc.), it became necessary to focus on spontaneous, naturalistic expressions captured under settings more proximal to real-world conditions, utilising more expressive emotion descriptions than a set of discrete labels. To this end, the main motivation of this thesis is to deal with challenges arising from the adoption of continuous dimensional emotion descriptions under naturalistic scenarios, considered to capture a much wider spectrum of expressive variability than basic emotions, and most importantly model emotional states which are commonly expressed by humans in their everyday life. In the first part of this thesis, we attempt to demystify the quite unexplored problem of predicting continuous emotional dimensions. This work is amongst the first to explore the problem of predicting emotion dimensions via multi-modal fusion, utilising facial expressions, auditory cues and shoulder gestures. A major contribution of the work presented in this thesis lies in proposing the utilisation of various relationships exhibited by emotion dimensions in order to improve the prediction accuracy of machine learning methods - an idea which has been taken on by other researchers in the field since. In order to experimentally evaluate this, we extend methods such as the Long Short-Term Memory Neural Networks (LSTM), the Relevance Vector Machine (RVM) and Canonical Correlation Analysis (CCA) in order to exploit output relationships in learning. As it is shown, this increases the accuracy of machine learning models applied to this task. The annotation of continuous dimensional emotions is a tedious task, highly prone to the influence of various types of noise. Performed real-time by several annotators (usually experts), the annotation process can be heavily biased by factors such as subjective interpretations of the emotional states observed, the inherent ambiguity of labels related to human behaviour, the varying reaction lags exhibited by each annotator as well as other factors such as input device noise and annotation errors. In effect, the annotations manifest a strong spatio-temporal annotator-specific bias. Failing to properly deal with annotation bias and noise leads to an inaccurate ground truth, and therefore to ill-generalisable machine learning models. This deems the proper fusion of multiple annotations, and the inference of a clean, corrected version of the ``ground truth'' as one of the most significant challenges in the area. A highly important contribution of this thesis lies in the introduction of Dynamic Probabilistic Canonical Correlation Analysis (DPCCA), a method aimed at fusing noisy continuous annotations. By adopting a private-shared space model, we isolate the individual characteristics that are annotator-specific and not shared, while most importantly we model the common, underlying annotation which is shared by annotators (i.e., the derived ground truth). By further learning temporal dynamics and incorporating a time-warping process, we are able to derive a clean version of the ground truth given multiple annotations, eliminating temporal discrepancies and other nuisances. The integration of the temporal alignment process within the proposed private-shared space model deems DPCCA suitable for the problem of temporally aligning human behaviour; that is, given temporally unsynchronised sequences (e.g., videos of two persons smiling), the goal is to generate the temporally synchronised sequences (e.g., the smile apex should co-occur in the videos). Temporal alignment is an important problem for many applications where multiple datasets need to be aligned in time. Furthermore, it is particularly suitable for the analysis of facial expressions, where the activation of facial muscles (Action Units) typically follows a set of predefined temporal phases. A highly challenging scenario is when the observations are perturbed by gross, non-Gaussian noise (e.g., occlusions), as is often the case when analysing data acquired under real-world conditions. To account for non-Gaussian noise, a robust variant of Canonical Correlation Analysis (RCCA) for robust fusion and temporal alignment is proposed. The model captures the shared, low-rank subspace of the observations, isolating the gross noise in a sparse noise term. RCCA is amongst the first robust variants of CCA proposed in literature, and as we show in related experiments outperforms other, state-of-the-art methods for related tasks such as the fusion of multiple modalities under gross noise. Beyond private-shared space models, Component Analysis (CA) is an integral component of most computer vision systems, particularly in terms of reducing the usually high-dimensional input spaces in a meaningful manner pertaining to the task-at-hand (e.g., prediction, clustering). A final, significant contribution of this thesis lies in proposing the first unifying framework for probabilistic component analysis. The proposed framework covers most well-known CA methods, such as Principal Component Analysis (PCA), Linear Discriminant Analysis (LDA), Locality Preserving Projections (LPP) and Slow Feature Analysis (SFA), providing further theoretical insights into the workings of CA. Moreover, the proposed framework is highly flexible, enabling novel CA methods to be generated by simply manipulating the connectivity of latent variables (i.e. the latent neighbourhood). As shown experimentally, methods derived via the proposed framework outperform other equivalents in several problems related to affect sensing and facial expression analysis, while providing advantages such as reduced complexity and explicit variance modelling.Open Acces
    corecore