Synthesis using speaker adaptation from speech recognition DB

Bonafonte Cávez, Antonio; Moreno Bilbao, M. Asunción; Oller Moreno, Sergio

research

Synthesis using speaker adaptation from speech recognition DB

Authors: Antonio Bonafonte Cávez
M. Asunción Moreno Bilbao
Sergio Oller Moreno
Publication date: 1 January 2010
Publisher: Universidad de Vigo

Abstract

This paper deals with the creation of multiple voices from a Hidden Markov Model based speech synthesis system (HTS). More than 150 Catalan synthetic voices were built using Hidden Markov Models (HMM) and speaker adaptation techniques. Training data for building a Speaker-Independent (SI) model were selected from both a general purpose speech synthesis database (FestCat;) and a database design ed for training Automatic Speech Recognition (ASR) systems (Catalan SpeeCon database). The SpeeCon database was also used to adapt the SI model to different speakers. Using an ASR designed database for TTS purposes provided many different amateur voices, with few minutes of recordings not performed in studio conditions. This paper shows how speaker adaptation techniques provide the right tools to generate multiple voices with very few adaptation data. A subjective evaluation was carried out to assess the intelligibility and naturalness of the generated voices as well as the similarity of the adapted voices to both the original speaker and the average voice from the SI model.Peer ReviewedPostprint (published version

Similar works

Full text

Open in the Core reader

Download PDF

Available Versions

UPCommons. Portal del coneixement obert de la UPC

oai:upcommons.upc.edu:2117/124...

Last time updated on 06/01/2019