Search CORE

17 research outputs found

Hierarchical attention transfer networks for depression assessment from speech

Author: Bao Zhongtian
Cummins Nicholas
Schuller Björn
Wang Haishuai
Zhang Zixing
Zhao Ziping
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2020
Field of study

2-D Attention Based Convolutional Recurrent Neural Network for Speech Emotion Recognition

Author: C Akalya Devi
D Karthika Renuka
Kruthikkha P C
P Ramya
S Soundarya
Winy Aarshana E
Publication venue: 'Universitas Komputer Indonesia'
Publication date: 26/12/2022
Field of study

Recognizing speech emotions  is a formidable challenge due to the complexity of emotions. The function of Speech Emotion Recognition(SER) is significantly impacted by the effects of emotional signals retrieved from speech. The majority of emotional traits, on the other hand, are sensitive to emotionally neutral elements like the speaker, speaking manner, and gender. In this work, we postulate that computing deltas  for individual features maintain useful information which is mainly relevant to emotional traits while it minimizes the loss of emotionally irrelevant components, thus leading to fewer misclassifications. Additionally, Speech Emotion Recognition(SER) commonly experiences silent and emotionally unrelated frames. The proposed technique is quite good at picking up important feature representations for emotion relevant features. So here is a two  dimensional convolutional recurrent neural network that is attention-based to learn distinguishing characteristics and predict the emotions. The Mel-spectrogram is used for feature extraction. The suggested technique is conducted on IEMOCAP dataset and it has better performance, with 68% accuracy value

Open Journal - Universitas Komputer Indonesia

Self-attention transfer networks for speech emotion recognition

Author: Bao Zhongtian
Cummins Nicholas
Schuller Björn W.
Sun Shihuang
Tao Jianhua
Wang Haishuai
Zhang Zixing
Zhao Ziping
Publication venue: 'Elsevier BV'
Publication date: 01/01/2021
Field of study

OPUS Augsburg

King's Research Portal

An auditory saliency pooling-based LSTM model for speech intelligibility classification

Author: Gallardo Antolín Ascensión
Montero Juan Manuel
Publication venue: 'MDPI AG'
Publication date: 01/01/2021
Field of study

This article belongs to the Section Computer and Engineering Science and Symmetry/Asymmetry.Speech intelligibility is a crucial element in oral communication that can be influenced by multiple elements, such as noise, channel characteristics, or speech disorders. In this paper, we address the task of speech intelligibility classification (SIC) in this last circumstance. Taking our previous works, a SIC system based on an attentional long short-term memory (LSTM) network, as a starting point, we deal with the problem of the inadequate learning of the attention weights due to training data scarcity. For overcoming this issue, the main contribution of this paper is a novel type of weighted pooling (WP) mechanism, called saliency pooling where the WP weights are not automatically learned during the training process of the network, but are obtained from an external source of information, the Kalinli’s auditory saliency model. In this way, it is intended to take advantage of the apparent symmetry between the human auditory attention mechanism and the attentional models integrated into deep learning networks. The developed systems are assessed on the UA-speech dataset that comprises speech uttered by subjects with several dysarthria levels. Results show that all the systems with saliency pooling significantly outperform a reference support vector machine (SVM)-based system and LSTM-based systems with mean pooling and attention pooling, suggesting that Kalinli’s saliency can be successfully incorporated into the LSTM architecture as an external cue for the estimation of the speech intelligibility level.The work leading to these results has been supported by the Spanish Ministry of Economy, Industry and Competitiveness through TEC2017-84395-P (MINECO) and TEC2017-84593-C2-1-R (MINECO) projects (AEI/FEDER, UE), and the Universidad Carlos III de Madrid under Strategic Action 2018/00071/001

Directory of Open Access Journals

Universidad Carlos III de Madrid e-Archivo

Archivo Digital UPM