Search CORE

675 research outputs found

Speech Synthesis Based on Hidden Markov Models

Author: Nankaku Y.
Oura K.
Toda T.
Tokuda K.
Yamagishi J.
Zen H.
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/05/2013
Field of study

The UPC Text-to-Speech System for Spanish and Catalan

Author: Bonafonte Cávez Antonio
Esquerra Llucià Ignasi
Febrer A
Rodríguez Fonollosa José Adrián
Vallverdú Bayés Sisco
Publication venue: 'The International Fiscal Association of Korea'
Publication date: 01/01/1998
Field of study

This paper summarizes the text-to-speech system that has been developed in the Speech Group of the Universitat Politècnica de Catalunya (UPC). The system is composed of a core and different interfaces so that it is compatible for research, for telephone applications (either CTI boards or standard ISDN PC cards supporting CAPI), and Windows applications developed using Microsoft SAPI. The paper reviews the system making emphasis in the parts of the system which are language dependent and which allow the reading of bilingual text (Spanish and Catalan). The paper also presents new approaches in prosodic modeling (segmental duration modeling) and generation of the database of speech segments, which have been introduced last year.Peer ReviewedPostprint (published version

UPCommons. Portal del coneixement obert de la UPC

Autoregressive neural F0 model for statistical parametric speech synthesis

Author: Takaki Shinji
Wang Xin
Yamagishi Junichi
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 19/04/2018
Field of study

Crossref

Edinburgh Research Explorer

Model-based Parametric Prosody Synthesis with Deep Neural Network

Author: Liu H
Lu H
Shao X
Xu Y
Publication venue: Interspeech 2016
Publication date: 01/09/2016
Field of study

Conventional statistical parametric speech synthesis (SPSS) captures only frame-wise acoustic observations and computes probability densities at HMM state level to obtain statistical acoustic models combined with decision trees, which is therefore a purely statistical data-driven approach without explicit integration of any articulatory mechanisms found in speech production research. The present study explores an alternative paradigm, namely, model-based parametric prosody synthesis (MPPS), which integrates dynamic mechanisms of human speech production as a core component of F0 generation. In this paradigm, contextual variations in prosody are processed in two separate yet integrated stages: linguistic to motor, and motor to acoustic. Here the motor model is target approximation (TA), which generates syllable-sized F0 contours with only three motor parameters that are associated to linguistic functions. In this study, we simulate this two-stage process by linking the TA model to a deep neural network (DNN), which learns the “linguistic-motor” mapping given the “motor-acoustic” mapping provided by TA-based syllable-wise F0 production. The proposed prosody modeling system outperforms the HMM-based baseline system in both objective and subjective evaluations

UCL Discovery

An RNN-based Quantized F0 Model with Multi-tier Feedback Links for Text-to-Speech Synthesis

Author: Takaki Shinji
Wang Xin
Yamagishi Junichi
Publication venue: 'International Speech Communication Association'
Publication date: 24/08/2017
Field of study

Crossref

Edinburgh Research Explorer