Accent plays a significant role in speech communication, influencing
understanding capabilities and also conveying a person's identity. This paper
introduces a novel and efficient framework for accented Text-to-Speech (TTS)
synthesis based on a Conditional Variational Autoencoder. It has the ability to
synthesize a selected speaker's speech that is converted to any desired target
accent. Our thorough experiments validate the effectiveness of our proposed
framework using both objective and subjective evaluations. The results also
show remarkable performance in terms of the ability to manipulate accents in
the synthesized speech and provide a promising avenue for future accented TTS
research.Comment: preprint submitted to a conference, under revie