GEmo-CLAP: Gender-Attribute-Enhanced Contrastive Language-Audio
  Pretraining for Speech Emotion Recognition

Fei, Wen; Hu, Yanni; Lu, Heng; Ma, Lei; Pan, Yu; Yang, Yuguang; Yao, Jixun

GEmo-CLAP: Gender-Attribute-Enhanced Contrastive Language-Audio Pretraining for Speech Emotion Recognition

Authors: Wen Fei
Yanni Hu
Heng Lu
Lei Ma
Yu Pan
Yuguang Yang
Jixun Yao
Publication date: 9 July 2023
Publisher

Abstract

Contrastive learning based pretraining methods have recently exhibited impressive success in diverse fields. In this paper, we propose GEmo-CLAP, a kind of efficient gender-attribute-enhanced contrastive language-audio pretraining (CLAP) model for speech emotion recognition. To be specific, we first build an effective emotion CLAP model Emo-CLAP for emotion recognition, utilizing various self-supervised learning based pre-trained models. Then, considering the importance of the gender attribute in speech emotion modeling, two GEmo-CLAP approaches are further proposed to integrate the emotion and gender information of speech signals, forming more reasonable objectives. Extensive experiments on the IEMOCAP corpus demonstrate that our proposed two GEmo-CLAP approaches consistently outperform the baseline Emo-CLAP with different pre-trained models, while also achieving superior recognition performance compared with other state-of-the-art methods.Comment: 5 page

Similar works

Full text

Available Versions

arXiv.org e-Print Archive

oai:arXiv.org:2306.07848

Last time updated on 16/06/2023