54,589 research outputs found
Multimodal Speech Emotion Recognition Using Audio and Text
Speech emotion recognition is a challenging task, and extensive reliance has
been placed on models that use audio features in building well-performing
classifiers. In this paper, we propose a novel deep dual recurrent encoder
model that utilizes text data and audio signals simultaneously to obtain a
better understanding of speech data. As emotional dialogue is composed of sound
and spoken content, our model encodes the information from audio and text
sequences using dual recurrent neural networks (RNNs) and then combines the
information from these sources to predict the emotion class. This architecture
analyzes speech data from the signal level to the language level, and it thus
utilizes the information within the data more comprehensively than models that
focus on audio features. Extensive experiments are conducted to investigate the
efficacy and properties of the proposed model. Our proposed model outperforms
previous state-of-the-art methods in assigning data to one of four emotion
categories (i.e., angry, happy, sad and neutral) when the model is applied to
the IEMOCAP dataset, as reflected by accuracies ranging from 68.8% to 71.8%.Comment: 7 pages, Accepted as a conference paper at IEEE SLT 201
Simulation of emotions of agents in virtual environments using neural networks
A distributed architecture for a system simulating the emotional state of an agent acting in a virtual environment is presented. The system is an implementation of an event appraisal model of emotional behaviour and uses neural networks to learn how the emotional state should be influenced by the occurrence of environmental and internal\ud
stimuli. A part of the modular system is domain-independent. The system can easily be adapted for handling different events that influence the emotional state. A first\ud
prototype and a testbed for this architecture are presented
Neurocognitive Informatics Manifesto.
Informatics studies all aspects of the structure of natural and artificial information systems. Theoretical and abstract approaches to information have made great advances, but human information processing is still unmatched in many areas, including information management, representation and understanding. Neurocognitive informatics is a new, emerging field that should help to improve the matching of artificial and natural systems, and inspire better computational algorithms to solve problems that are still beyond the reach of machines. In this position paper examples of neurocognitive inspirations and promising directions in this area are given
Speech Emotion Recognition Using Multi-hop Attention Mechanism
In this paper, we are interested in exploiting textual and acoustic data of
an utterance for the speech emotion classification task. The baseline approach
models the information from audio and text independently using two deep neural
networks (DNNs). The outputs from both the DNNs are then fused for
classification. As opposed to using knowledge from both the modalities
separately, we propose a framework to exploit acoustic information in tandem
with lexical data. The proposed framework uses two bi-directional long
short-term memory (BLSTM) for obtaining hidden representations of the
utterance. Furthermore, we propose an attention mechanism, referred to as the
multi-hop, which is trained to automatically infer the correlation between the
modalities. The multi-hop attention first computes the relevant segments of the
textual data corresponding to the audio signal. The relevant textual data is
then applied to attend parts of the audio signal. To evaluate the performance
of the proposed system, experiments are performed in the IEMOCAP dataset.
Experimental results show that the proposed technique outperforms the
state-of-the-art system by 6.5% relative improvement in terms of weighted
accuracy.Comment: 5 pages, Accepted as a conference paper at ICASSP 2019 (oral
presentation
- âŠ