Search CORE

10,228 research outputs found

Adversarial Training in Affective Computing and Sentiment Analysis: Recent Advances and Perspectives

Author: Cummins Nicholas
Han Jing
Schuller Björn
Zhang Zixing
Publication venue
Publication date: 21/09/2018
Field of study

Over the past few years, adversarial training has become an extremely active research topic and has been successfully applied to various Artificial Intelligence (AI) domains. As a potentially crucial technique for the development of the next generation of emotional AI systems, we herein provide a comprehensive overview of the application of adversarial training to affective computing and sentiment analysis. Various representative adversarial training algorithms are explained and discussed accordingly, aimed at tackling diverse challenges associated with emotional AI systems. Further, we highlight a range of potential future research directions. We expect that this overview will help facilitate the development of adversarial training for affective computing and sentiment analysis in both the academic and industrial communities

arXiv.org e-Print Archive

OPUS Augsburg

Improved Emotion Recognition Using Gaussian Mixture Model and Extreme Learning Machine in Speech and Glottal Signals

Author: Hariharan Muthusamy
Kemal Polat
Sazali Yaacob
Publication venue: 'Hindawi Limited'
Publication date: 01/01/2015
Field of study

Recently, researchers have paid escalating attention to studying the emotional state of an individual from his/her speech signals as the speech signal is the fastest and the most natural method of communication between individuals. In this work, new feature enhancement using Gaussian mixture model (GMM) was proposed to enhance the discriminatory power of the features extracted from speech and glottal signals. Three different emotional speech databases were utilized to gauge the proposed methods. Extreme learning machine (ELM) and k-nearest neighbor (kNN) classifier were employed to classify the different types of emotions. Several experiments were conducted and results show that the proposed methods significantly improved the speech emotion recognition performance compared to research works published in the literature

Crossref

Directory of Open Access Journals

Predictive biometrics: A review and analysis of predicting personal characteristics from biometric data

Author: Abreu M.C.C.
Abreu M.C.C.
Abreu M.C.D.C.
Chang T.‐Y.
Chen Y.‐L.
Dobry G.
Gao Y.
Geng X.
Giot R.
Idrus S.
Jain A.K.
Leon S.
Li C.
Li S.Z.
Likert R.
Livingstone S.R.
Lu H.
Matta F.
Mutalib S.
Pan L.
Pervouchine V.
Pisani P.H.
Proenca H.
Ricanek K.
Rodrigues R.N.
Roli F.
Santos O.C.
Schuller B.
Tapia J.E.
Teh P.S.
Wang Z.‐H.
Yan H.
Publication venue: 'Institution of Engineering and Technology (IET)'
Publication date: 16/11/2017
Field of study

Interest in the exploitation of soft biometrics information has continued to develop over the last decade or so. In comparison with traditional biometrics, which focuses principally on person identification, the idea of soft biometrics processing is to study the utilisation of more general information regarding a system user, which is not necessarily unique. There are increasing indications that this type of data will have great value in providing complementary information for user authentication. However, the authors have also seen a growing interest in broadening the predictive capabilities of biometric data, encompassing both easily definable characteristics such as subject age and, most recently, `higher level' characteristics such as emotional or mental states. This study will present a selective review of the predictive capabilities, in the widest sense, of biometric data processing, providing an analysis of the key issues still adequately to be addressed if this concept of predictive biometrics is to be fully exploited in the future

Crossref

Sheffield Hallam University Research Archive

Multitask Learning from Augmented Auxiliary Data for Improving Speech Emotion Recognition

Author: Jurdak Raja
Khalifa Sara
Latif Siddique
Rana Rajib
Schuller Björn W.
Publication venue
Publication date: 12/07/2022
Field of study

Despite the recent progress in speech emotion recognition (SER), state-of-the-art systems lack generalisation across different conditions. A key underlying reason for poor generalisation is the scarcity of emotion datasets, which is a significant roadblock to designing robust machine learning (ML) models. Recent works in SER focus on utilising multitask learning (MTL) methods to improve generalisation by learning shared representations. However, most of these studies propose MTL solutions with the requirement of meta labels for auxiliary tasks, which limits the training of SER systems. This paper proposes an MTL framework (MTL-AUG) that learns generalised representations from augmented data. We utilise augmentation-type classification and unsupervised reconstruction as auxiliary tasks, which allow training SER systems on augmented data without requiring any meta labels for auxiliary tasks. The semi-supervised nature of MTL-AUG allows for the exploitation of the abundant unlabelled data to further boost the performance of SER. We comprehensively evaluate the proposed framework in the following settings: (1) within corpus, (2) cross-corpus and cross-language, (3) noisy speech, (4) and adversarial attacks. Our evaluations using the widely used IEMOCAP, MSP-IMPROV, and EMODB datasets show improved results compared to existing state-of-the-art methods.Comment: Under review IEEE Transactions on Affective Computin

arXiv.org e-Print Archive

Queensland University of Technology ePrints Archive

I hear you eat and speak: automatic recognition of eating condition and food type, use-cases, and impact on ASR performance

Author: Batliner A
Hantke S
Kurle R
Mousa AELD
Ringeval F
Schuller B
Weninger F
Publication venue: 'Public Library of Science (PLoS)'
Publication date: 14/04/2016
Field of study

We propose a new recognition task in the area of computational paralinguistics: automatic recognition of eating conditions in speech, i. e., whether people are eating while speaking, and what they are eating. To this end, we introduce the audio-visual iHEARu-EAT database featuring 1.6 k utterances of 30 subjects (mean age: 26.1 years, standard deviation: 2.66 years, gender balanced, German speakers), six types of food (Apple, Nectarine, Banana, Haribo Smurfs, Biscuit, and Crisps), and read as well as spontaneous speech, which is made publicly available for research purposes. We start with demonstrating that for automatic speech recognition (ASR), it pays off to know whether speakers are eating or not. We also propose automatic classification both by brute-forcing of low-level acoustic features as well as higher-level features related to intelligibility, obtained from an Automatic Speech Recogniser. Prediction of the eating condition was performed with a Support Vector Machine (SVM) classifier employed in a leave-one-speaker-out evaluation framework. Results show that the binary prediction of eating condition (i. e., eating or not eating) can be easily solved independently of the speaking condition; the obtained average recalls are all above 90%. Low-level acoustic features provide the best performance on spontaneous speech, which reaches up to 62.3% average recall for multi-way classification of the eating condition, i. e., discriminating the six types of food, as well as not eating. The early fusion of features related to intelligibility with the brute-forced acoustic feature set improves the performance on read speech, reaching a 66.4% average recall for the multi-way classification task. Analysing features and classifier errors leads to a suitable ordinal scale for eating conditions, on which automatic regression can be performed with up to 56.2% determination coefficient

Directory of Open Access Journals

Spiral - Imperial College Digital Repository

Beyond Biometrics

Author: Broek Egon L. van den
Publication venue: Elsevier
Publication date: 01/01/2010
Field of study

Throughout the last 40 years, the essence of automated identification of users has remained the same. In this article, a new class of biometrics is proposed that is founded on processing biosignals, as opposed to images. After a brief introduction on biometrics, biosignals are discussed, including their advantages, disadvantages, and guidelines for obtaining them. This new class of biometrics increases biometrics’ robustness and enables cross validation. Next, biosignals’ use is illustrated by two biosignal-based biometrics: voice identification and handwriting recognition. Additionally, the concept of a digital human model is introduced. Last, some issues will be touched upon that will arise when biosignal-based biometrics are brought to practice

Elsevier - Publisher Connector

University of Twente Research Information

Affect recognition from face and body: Early fusion vs. late fusion

Author: Gunes H
Piccardi M
Publication venue
Publication date: 30/11/2005
Field of study

This paper presents an approach to automatic visual emotion recognition from two modalities: face and body. Firstly, individual classifiers are trained from individual modalities. Secondly, we fuse facial expression and affective body gesture information first at a feature-level, in which the data from both modalities are combined before classification, and later at a decision-level, in which we integrate the outputs of the monomodal systems by the use of suitable criteria. We then evaluate these two fusion approaches, in terms of performance over monomodal emotion recognition based on facial expression modality only. In the experiments performed the emotion classification using the two modalities achieved a better recognition accuracy outperforming the classification using the individual facial modality. Moreover, fusion at the feature-level proved better recognition than fusion at the decision-level. © 2005 IEEE

OPUS - University of Technology Sydney

Comprehensive Study of Automatic Speech Emotion Recognition Systems

Author: Jagtap Sonal
Kawade Rupali
Publication venue: Auricle Global Society of Education and Research
Publication date: 31/08/2023
Field of study

Speech emotion recognition (SER) is the technology that recognizes psychological characteristics and feelings from the speech signals through techniques and methodologies. SER is challenging because of more considerable variations in different languages arousal and valence levels. Various technical developments in artificial intelligence and signal processing methods have encouraged and made it possible to interpret emotions.SER plays a vital role in remote communication. This paper offers a recent survey of SER using machine learning (ML) and deep learning (DL)-based techniques. It focuses on the various feature representation and classification techniques used for SER. Further, it describes details about databases and evaluation metrics used for speech emotion recognition

International Journal on Recent and Innovation Trends in Computing and Communication