Syllabic Pitch Tuning for Neutral-to-Emotional Voice Conversion

Cernak, Milos; Na, Xingyu; Saheer, Lakshmi

Syllabic Pitch Tuning for Neutral-to-Emotional Voice Conversion

Authors: Milos Cernak
Xingyu Na
Lakshmi Saheer
Publication date: 19 October 2015
Publisher: Idiap

Abstract

Prosody plays an important role in both identification and synthesis of emotionalized speech. Prosodic features like pitch are usually estimated and altered at a segmental level based on short windows of speech (where the signal is expected to be quasi-stationary). This results in a frame-wise change of acoustical parameters for synthesizing emotionalized speech. In order to convert a neutral speech to an emotional speech from the same user, it might be better to alter the pitch parameters at the suprasegmental level like at the syllable-level since the changes in the signal are more subtle and smooth. In this paper we aim to show that the pitch transformation in a neutral-to-emotional voice conversion system may result in a better speech quality output if the transformations are performed at the supra-segmental (syllable) level rather than a frame-level change. Subjective evaluation results are shown to demonstrate if the naturalness, speaker similarity and the emotion recognition tasks show any performance difference

Similar works

Full text

Open in the Core reader

Download PDF

Available Versions

Infoscience - École polytechnique fédérale de Lausanne

oai:infoscience.tind.io:213073

Last time updated on 09/02/2018