Search CORE

2 research outputs found

BUCEADOR hybrid TTS for blizzard challenge 2011

Author: Adell Mercado Jordi
Bonafonte Cávez Antonio
Erro Eslava Daniel
Navas Eva
Sainz Iñaki
Publication venue
Publication date: 01/01/2011
Field of study

This paper describes the Text-to-Speech (TTS) systems presented by the Buceador Consortium in the Blizzard Challenge 2011 evaluation campaign. The main system is a concatenative hybrid one that tries to combine the strong points of both statistical and unit selection synthesis (i.e. robustness and segmental naturalness respectively). The hybrid system has reached results significantly above average as far as similarity and naturalness are concerned, with no significant differences with most of the systems in the intelligibility task. This clearly improves the performance achieved in previous participations, and shows the validity of the hybrid approach proposed. Besides, an HMM-based system was built for the ES1 intelligibility tasks, using an HNM-based vocoder.Peer ReviewedPostprint (published version

UPCommons. Portal del coneixement obert de la UPC

Improving phone duration modelling using support vector regression fusion

Author: Fakotakis Nikos
Ganchev Todor
Kokkinakis George
Lazaridis Alexandros
Mporas Iosif
Publication venue: Elsevier : North-Holland
Publication date: 19/11/2010
Field of study

International audienceIn the present work, we propose a scheme for the fusion of different phone duration models, operating in parallel. Specifically, the predictions from a group of dissimilar and independent to each other individual duration models are fed to a machine learning algorithm, which reconciles and fuses the outputs of the individual models, yielding more precise phone duration predictions. The performance of the individual duration models and of the proposed fusion scheme is evaluated on the American-English KED TIMIT and on the Greek WCL-1 databases. On both databases, the SVR-based individual model demonstrates the lowest error rate. When compared to the second-best individual algorithm, a relative reduction of the mean absolute error (MAE) and the root mean square error (RMSE) by 5.5% and 3.7% on KED TIMIT, and 6.8% and 3.7% on WCL-1 is achieved. At the fusion stage, we evaluate the performance of twelve fusion techniques. The proposed fusion scheme, when implemented with SVR-based fusion, contributes to the improvement of the phone duration prediction accuracy over the one of the best individual model, by 1.9% and 2.0% in terms of relative reduction of the MAE and RMSE on KED TIMIT, and by 2.6% and 1.8% on the WCL-1 database

Crossref

HAL Descartes