Search CORE

16 research outputs found

Nonparallel Emotional Speech Conversion

Author: Chakraborty Deep
Gao Jian
Olaleye Olaitan
Tembine Hamidou
Publication venue: 'International Speech Communication Association'
Publication date: 13/04/2020
Field of study

We propose a nonparallel data-driven emotional speech conversion method. It enables the transfer of emotion-related characteristics of a speech signal while preserving the speaker's identity and linguistic content. Most existing approaches require parallel data and time alignment, which is not available in most real applications. We achieve nonparallel training based on an unsupervised style transfer technique, which learns a translation model between two distributions instead of a deterministic one-to-one mapping between paired examples. The conversion model consists of an encoder and a decoder for each emotion domain. We assume that the speech signal can be decomposed into an emotion-invariant content code and an emotion-related style code in latent space. Emotion conversion is performed by extracting and recombining the content code of the source speech and the style code of the target emotion. We tested our method on a nonparallel corpora with four emotions. Both subjective and objective evaluations show the effectiveness of our approach.Comment: Published in INTERSPEECH 2019, 5 pages, 6 figures. Simulation available at http://www.jian-gao.org/emoga

arXiv.org e-Print Archive

Crossref

Analysis of prosodic correlates of emotional speech data

Author: Bartkova Katarina
Jouvet Denis
Publication venue: HAL CCSD
Publication date: 28/08/2018
Field of study

International audienceThe study of expressive speech styles remains an important topic as to their parameters detection or prediction in speech processing. In this paper, we analyze prosodic correlates for six emotion styles (anger, disgust, joy, fear, surprise and sadness), using data uttered by two speakers. The analysis is focused on the way pronunciations and prosodic parameters are modified in emotional speech, compared to neutral style. The analysis concerns speech pronunciation modifications, presence of pauses in sentences, and local prosodic behavior, with an emphasis set on the analysis of the prosody over prosodic groups and breathing groups

Crossref

INRIA a CCSD electronic archive server

Hal-Diderot

Emotion Recognition via Continuous Mandarin Speech

Author: Jun-Heng Yeh
Tsang-Long Pao
Yu-Te Chen
Publication venue: 'IntechOpen'
Publication date: 01/10/2008
Field of study

IntechOpen

Syllabic Pitch Tuning for Neutral-to-Emotional Voice Conversion

Author: Cernak Milos
Na Xingyu
Saheer Lakshmi
Publication venue: Idiap
Publication date: 19/10/2015
Field of study

Prosody plays an important role in both identification and synthesis of emotionalized speech. Prosodic features like pitch are usually estimated and altered at a segmental level based on short windows of speech (where the signal is expected to be quasi-stationary). This results in a frame-wise change of acoustical parameters for synthesizing emotionalized speech. In order to convert a neutral speech to an emotional speech from the same user, it might be better to alter the pitch parameters at the suprasegmental level like at the syllable-level since the changes in the signal are more subtle and smooth. In this paper we aim to show that the pitch transformation in a neutral-to-emotional voice conversion system may result in a better speech quality output if the transformations are performed at the supra-segmental (syllable) level rather than a frame-level change. Subjective evaluation results are shown to demonstrate if the naturalness, speaker similarity and the emotion recognition tasks show any performance difference

Infoscience - École polytechnique fédérale de Lausanne

Spectral Properties and Prosodic Parameters of Emotional Speech in Czech and Slovak

Author: Anna Pribilova
Jiri Pribil
Publication venue: 'IntechOpen'
Publication date: 21/06/2011
Field of study

IntechOpen