3 research outputs found

    Generation of fundamental frequency contours for Thai speech synthesis using tone nucleus model

    No full text
    In this information decades, speech media is one of the new coming interfaces between human and machines. Applications with this interface help users to access information while they can continue their front tasks. Not only speech recognition but speech synthesis has been also introduced and embedded in such applications. However, the users prefer the synthetic speech with intelligibility and naturalness regardless of how many other abilities the application provides. The speech synthesis for tonal languages is much more challenge than that for non-tonal languages, because both intonation and tones need to be concerned. Fundamental frequency is one of acoustic features relating to the intonation and tones. Existing F0 models for Thai language are expensive to complete the F0 generation from their parameters and suffer when the size of the available data to build the model is small. With many advantages of the tone nucleus model which has been originated in Mandarin, we have pioneered adapting this model in Thai language to meet the classic but still intrinsic requirements of speech synthesis in continuous speech. Tone nuclei are analytically defined for all five distinctive Thai tones according to their underlying targets. The full process of the F0 contour generation is presented from the tone nucleus extraction, parameter extraction, parameter prediction, until the F0 contour generation for the continuous speech. Again, the model is successfully proven to be adapted in the other language than Mandarin through objective and subjective tests. The tests confirmed the efficiency and adaptability of the model. Compared to the F0 contours generated by the predictors trained from the contours in the whole syllables without extracting the tone nuclei, the model generated the F0 contours in continuous utterances with less distortion but more tone intelligibility and naturalness. Proposed methodology in parameter prediction and the F0 contour generation processes improved the quality of the synthetic speeches by reducing the distortion and increasing the tone intelligibility and naturalness significantly.報告番号: ; 学位授与日: 2013-03-25 ; 学位の種別: 修士 ; 学位の種類: 修士(情報理工学) ; 学位記番号: ; 研究科・専攻: 情報理工学系研究科・電子情報学専
    corecore