Strategies for developing a conversational speech dataset for Text-To-Speech Synthesis

Abstract

Funding Information: The first author has received funding from the European Union's Horizon 2020 research and innovation program under the Marie Skłodowska Curie grant agreement No 859588. The authors are thankful to Maaike Groenewege, Johannah O'Mahony and ReadSpeaker's R&D team whose suggestions and discussions have been instrumental in shaping the direction of this paper. Funding Information: The first author has received funding from the European Union’s Horizon 2020 research and innovation program under the Marie Skłodowska Curie grant agreement No 859588. The authors are thankful to Maaike Groenewege, Johannah O’Mahony and ReadSpeaker’s R&D team whose suggestions and discussions have been instrumental in shaping the direction of this paper. Publisher Copyright: Copyright © 2022 ISCA.There have been many efforts to improve the quality of speech synthesis systems in conversational AI. Although state-of-the-art systems are capable of producing natural-sounding speech, the generated speech often lacks prosodic variation and is not always suited to the task. In this paper, we examine dialogue data collection methods to use as training data for our acoustic models. We collect speech using three different setups: (1) Random read-aloud sentences; (2) Performed dialogues; (3) Semi-Spontaneous dialogues. We analyze prosodic and textual properties of the data collected in these setups and make some recommendations to collect data for speech synthesis in conversational AI settings.Peer reviewe

    Similar works