675 research outputs found
The UPC Text-to-Speech System for Spanish and Catalan
This paper summarizes the text-to-speech system that has been developed in the Speech Group of the Universitat Politècnica de Catalunya (UPC). The system is composed of a core and different interfaces so that it is compatible for research, for telephone applications (either CTI boards or standard ISDN PC cards supporting CAPI), and Windows applications developed using Microsoft SAPI. The paper reviews the system making
emphasis in the parts of the system which are language
dependent and which allow the reading of bilingual text (Spanish
and Catalan). The paper also presents new approaches in prosodic modeling (segmental duration modeling) and generation of the database of speech segments, which have been introduced last year.Peer ReviewedPostprint (published version
Model-based Parametric Prosody Synthesis with Deep Neural Network
Conventional statistical parametric speech synthesis (SPSS) captures only frame-wise acoustic observations and computes probability densities at HMM state level to obtain statistical acoustic models combined with decision trees, which is therefore a purely statistical data-driven approach without explicit integration of any articulatory mechanisms found in speech production research. The present study explores an alternative paradigm, namely, model-based parametric prosody synthesis (MPPS), which integrates dynamic mechanisms of human speech production as a core component of F0 generation. In this paradigm, contextual variations in prosody are processed in two separate yet integrated stages: linguistic to motor, and motor to acoustic. Here the motor model is target approximation (TA), which generates syllable-sized F0 contours with only three motor parameters that are associated to linguistic functions. In this study, we simulate this two-stage process by linking the TA model to a deep neural network (DNN), which learns the “linguistic-motor” mapping given the “motor-acoustic” mapping provided by TA-based syllable-wise F0 production. The proposed prosody modeling system outperforms the HMM-based baseline system in both objective and subjective evaluations
- …