Accented Text-to-Speech Synthesis with a Conditional Variational
  Autoencoder

Herremans, Dorien; Mehrish, Ambuj; Melechovsky, Jan; Sisman, Berrak

Accented Text-to-Speech Synthesis with a Conditional Variational Autoencoder

Authors: Dorien Herremans
Ambuj Mehrish
Jan Melechovsky
Berrak Sisman
Publication date: 7 November 2022
Publisher

Abstract

Accent plays a significant role in speech communication, influencing understanding capabilities and also conveying a person's identity. This paper introduces a novel and efficient framework for accented Text-to-Speech (TTS) synthesis based on a Conditional Variational Autoencoder. It has the ability to synthesize a selected speaker's speech that is converted to any desired target accent. Our thorough experiments validate the effectiveness of our proposed framework using both objective and subjective evaluations. The results also show remarkable performance in terms of the ability to manipulate accents in the synthesized speech and provide a promising avenue for future accented TTS research.Comment: preprint submitted to a conference, under revie

Similar works

Full text

Available Versions

arXiv.org e-Print Archive

oai:arXiv.org:2211.03316

Last time updated on 12/12/2022