2 research outputs found

    자동 운율 복제를 위한 모음 길이와 기본 주파수 예측

    Get PDF
    학위논문 (석사)-- 서울대학교 대학원 : 인문대학 협동과정 인지과학전공, 2018. 8. 정민화.The use of computers to help people improve their pronunciation skills of a foreign language has rapidly increased in the last decades. Majority of such Computer-Assisted Pronunciation Training (CAPT) systems have been focused on teaching correct pronunciation of segments only, however, while prosody received much less attention. One of the new approaches to prosody training is self-imitation learning. Prosodic features from a native utterance are transplanted onto learners own speech, and given back as corrective feedback. The main drawback is that this technique requires two identical sets of native and non-native utterances, which makes its actual implementation cumbersome and inflexible. As a preliminary research towards developing a new method of prosody transplantation, the first part of the study surveys previous related works and points out their advantages and drawbacks. We also compare prosodic systems of Korean and English, point out major areas of mistakes that Korean learners of English tend to do, and then we analyze acoustic features that this mistakes are correlated with. We suggest that transplantation of vowel duration and fundamental frequency will be the most effective for self-imitation learning by Korean speakers of English. The second part of this study introduces a new proposed model for prosody transplantation. Instead of transplanting acoustic values from a pre-recorded utterance, we suggest to use a deep neural network (DNN) based system to predict them instead. Three different models are built and described: baseline recurrent neural network (RNN), long short-term memory (LSTM) model and gated recurrent unit (GRU) model. The models were trained on Boston University Radio Speech Corpus, using a minimal set of relevant input features. The models were compared with each other, as well as with state-of-the-art prosody prediction systems from speech synthesis research. Implementation of the proposed prediction model in automatic prosody transplantation is described and the results are analyzed. A perceptual evaluation by native speakers was carried out. Accentedness and comprehensibility ratings of modified and original non-native utterances were compared with each other. The results showed that duration transplantation can lead to the improvements in comprehensibility score. This study lays the groundwork for a fully automatic self-imitation prosody training system and its results can be used to help Korean learners master problematic areas of English prosody, such as sentence stress.Chapter 1. Introduction . 10 1.1 Background. 10 1.2 Research Objective 12 1.3 Research Outline. 15 Chapter 2. Related Works. 16 2.1 Self-imitation Prosody Training. 16 2.1.1 Prosody Transplantation Methods . 18 2.1.2 Effects of Prosody Transplantation on Accentedness Rating 23 2.1.3 Effects of Self-Imitation Learning on Proficiency Rating 26 2.2 Prosody of Korean-accented English Speech 28 2.2.1 Prosodic Systems of Korean and English 28 2.2.2 Common Prosodic Mistakes. 29 2.3 Deep Learning Based Prosody Prediction 34 2.3.1 Deep Learning . 34 2.3.2 Recurrent Neural Networks 35 2.3.2 The Long Short-Term Memory Architecture. 37 2.3.3 Gated Recurrent Units. 39 2.3.4 Prosody Prediction Models 40 Chapter 3. Vowel Duration and Fundamental Frequency Prediction Model 43 3.1 Data 43 3.2. Input Feature Selection. 45 3.3 System Architecture and Training 56 3.4 Results and Evaluation 63 3.4.1 Objective Metrics. 63 3.4.2 Vowel Duration Prediction Models Results. 65 3.4.2 Fundamental Frequency Prediction Models Results 68 3.4.3 Comparison with other models . 68 Chapter 4. Automatic Prosody Transplantation 72 4.1 Data 72 4.2 Transplantation Method. 74 4.3 Perceptual Evaluation 79 4.4 Results 80 Chapter 5. Conclusion. 82 5.1 Summary 82 5.2 Contribution 84 5.3 Limitations 85 5.4 Recommendations for Future Study. 85 References 88 Appendix 96Maste
    corecore