Search CORE

3 research outputs found

Generation of fundamental frequency contours for Thai speech synthesis using tone nucleus model

Author: Krityakien Oraphan
Publication venue: Department of Information and Communication Engineering, Graduate School of Information Science and Technology, The University of Tokyo
Publication date: 25/03/2013
Field of study

In this information decades, speech media is one of the new coming interfaces between human and machines. Applications with this interface help users to access information while they can continue their front tasks. Not only speech recognition but speech synthesis has been also introduced and embedded in such applications. However, the users prefer the synthetic speech with intelligibility and naturalness regardless of how many other abilities the application provides. The speech synthesis for tonal languages is much more challenge than that for non-tonal languages, because both intonation and tones need to be concerned. Fundamental frequency is one of acoustic features relating to the intonation and tones. Existing F0 models for Thai language are expensive to complete the F0 generation from their parameters and suffer when the size of the available data to build the model is small. With many advantages of the tone nucleus model which has been originated in Mandarin, we have pioneered adapting this model in Thai language to meet the classic but still intrinsic requirements of speech synthesis in continuous speech. Tone nuclei are analytically defined for all five distinctive Thai tones according to their underlying targets. The full process of the F0 contour generation is presented from the tone nucleus extraction, parameter extraction, parameter prediction, until the F0 contour generation for the continuous speech. Again, the model is successfully proven to be adapted in the other language than Mandarin through objective and subjective tests. The tests confirmed the efficiency and adaptability of the model. Compared to the F0 contours generated by the predictors trained from the contours in the whole syllables without extracting the tone nuclei, the model generated the F0 contours in continuous utterances with less distortion but more tone intelligibility and naturalness. Proposed methodology in parameter prediction and the F0 contour generation processes improved the quality of the synthetic speeches by reducing the distortion and increasing the tone intelligibility and naturalness significantly.報告番号: ; 学位授与日: 2013-03-25 ; 学位の種別: 修士 ; 学位の種類: 修士（情報理工学） ; 学位記番号: ; 研究科・専攻: 情報理工学系研究科・電子情報学専

UT Repository

Generation of fundamental frequency contours for Thai speech synthesis using tone nucleus model

Author: Krityakien Oraphan
Publication venue
Publication date
Field of study

Institutional Repositories DataBase (IRDB)

Detecting autism, emotions and social signals using AdaBoost

Author: Busa-Fekete Róbert
Gosztolya Gábor
Tóth László
Publication venue: Interspeech
Publication date: 01/01/2013
Field of study

SZTE Publicatio Repozitórium - SZTE - Repository of Publications