4 research outputs found
μ΄μ¨ μ 보λ₯Ό μ΄μ©ν λ§λΉλ§μ₯μ μμ± μλ κ²μΆ λ° νκ°
νμλ
Όλ¬Έ (μμ¬) -- μμΈλνκ΅ λνμ : μΈλ¬Έλν μΈμ΄νκ³Ό, 2020. 8. Minhwa Chung.λ§μ₯μ λ μ κ²½κ³ λλ ν΄νμ± μ§νμμ κ°μ₯ 빨리 λνλλ μ¦ μ μ€ νλμ΄λ€. λ§λΉλ§μ₯μ λ νν¨μ¨λ³, λμ± λ§λΉ, κ·ΌμμΆμ± μΈ‘μ κ²½νμ¦, λ€λ°μ± κ²½νμ¦ νμ λ± λ€μν νμκ΅°μμ λνλλ€. λ§λΉλ§μ₯μ λ μ‘°μκΈ°κ΄ μ κ²½μ μμμΌλ‘ λΆμ νν μ‘°μμ μ£Όμ νΉμ§μΌλ‘ κ°μ§κ³ , μ΄μ¨μλ μν₯μ λ―ΈμΉλ κ²μΌλ‘ λ³΄κ³ λλ€. μ ν μ°κ΅¬μμλ μ΄μ¨ κΈ°λ° μΈ‘μ μΉλ₯Ό λΉμ₯μ λ°νμ λ§λΉλ§μ₯μ λ°νλ₯Ό ꡬλ³νλ κ²μ μ¬μ©νλ€. μμ νμ₯μμλ λ§λΉλ§μ₯μ μ λν μ΄μ¨ κΈ°λ° λΆμμ΄ λ§λΉλ§μ₯μ λ₯Ό μ§λ¨νκ±°λ μ₯μ μμμ λ°λ₯Έ μλ§μ μΉλ£λ²μ μ€λΉνλ κ²μ λμμ΄ λ κ²μ΄λ€. λ°λΌμ λ§λΉλ§μ₯μ κ° μ΄μ¨μ μν₯μ λ―ΈμΉλ μμλΏλ§ μλλΌ λ§λΉλ§μ₯μ μ μ΄μ¨ νΉμ§μ κΈ΄λ°νκ² μ΄ν΄λ³΄λ κ²μ΄ νμνλ€. ꡬ체 μ μΌλ‘, μ΄μ¨μ΄ μ΄λ€ μΈ‘λ©΄μμ λ§λΉλ§μ₯μ μ μν₯μ λ°λμ§, κ·Έλ¦¬κ³ μ΄μ¨ μ κ° μ₯μ μ λμ λ°λΌ μ΄λ»κ² λ€λ₯΄κ² λνλλμ§μ λν λΆμμ΄ νμνλ€. λ³Έ λ
Όλ¬Έμ μλμ΄, μμ§, λ§μλ, λ¦¬λ¬ λ± μ΄μ¨μ λ€μν μΈ‘λ©΄μ μ μ΄ν΄λ³΄κ³ , λ§λΉλ§μ₯μ κ²μΆ λ° νκ°μ μ¬μ©νμλ€. μΆμΆλ μ΄μ¨ νΉμ§λ€μ λͺ κ°μ§ νΉμ§ μ ν μκ³ λ¦¬μ¦μ ν΅ν΄ μ΅μ νλμ΄ λ¨Έμ λ¬λ κΈ°λ° λΆλ₯κΈ°μ μ
λ ₯κ°μΌλ‘ μ¬μ©λμλ€. λΆλ₯κΈ°μ μ±λ₯μ μ νλ, μ λ°λ, μ¬νμ¨, F1-μ μλ‘ νκ°λμλ€. λν, λ³Έ λ
Όλ¬Έμ μ₯μ μ€μ¦λ(κ²½λ, μ€λ±λ, μ¬λ)μ λ°λΌ μ΄μ¨ μ 보 μ¬μ©μ μ μ©μ±μ λΆμνμλ€. λ§μ§λ§μΌλ‘, μ₯μ λ°ν μμ§μ΄ μ΄λ €μ΄ λ§νΌ, λ³Έ μ°κ΅¬λ κ΅μ°¨ μΈμ΄ λΆλ₯κΈ°λ₯Ό μ¬μ©νμλ€. νκ΅μ΄μ μμ΄ μ₯μ λ°νκ° νλ ¨ μ
μΌλ‘ μ¬μ©λμμΌλ©°, ν
μ€νΈμ
μΌλ‘λ κ° λͺ©ν μΈμ΄λ§μ΄ μ¬μ©λμλ€. μ€ν κ²°κ³Όλ λ€μκ³Ό κ°μ΄ μΈ κ°μ§λ₯Ό μμ¬νλ€. 첫째, μ΄μ¨ μ 보 λ₯Ό μ¬μ©νλ κ²μ λ§λΉλ§μ₯μ κ²μΆ λ° νκ°μ λμμ΄ λλ€. MFCC λ§μ μ¬μ©νμ λμ λΉκ΅νμ λ, μ΄μ¨ μ 보λ₯Ό ν¨κ» μ¬μ©νλ κ²μ΄ νκ΅μ΄μ μμ΄ λ°μ΄ν°μ
λͺ¨λμμ λμμ΄ λμλ€. λμ§Έ, μ΄μ¨ μ 보λ νκ°μ νΉν μ μ©νλ€. μμ΄μ κ²½μ° κ²μΆκ³Ό νκ°μμ κ°κ° 1.82%μ 20.6%μ μλμ μ νλ ν₯μμ 보μλ€. νκ΅μ΄μ κ²½μ° κ²μΆμμλ ν₯μμ 보μ΄μ§ μμμ§λ§, νκ°μμλ 13.6%μ μλμ ν₯μμ΄ λνλ¬λ€. μ
μ§Έ, κ΅μ°¨ μΈμ΄ λΆλ₯κΈ°λ λ¨μΌ μΈμ΄ λΆλ₯κΈ°λ³΄λ€ ν₯μλ κ²°κ³Όλ₯Ό 보μΈλ€. μ€ν κ²°κ³Ό κ΅μ°¨μΈμ΄ λΆλ₯κΈ°λ λ¨μΌ μΈμ΄ λΆλ₯κΈ°μ λΉκ΅νμ λ μλμ μΌλ‘ 4.12% λμ μ νλλ₯Ό 보μλ€. μ΄κ²μ νΉμ μ΄μ¨ μ₯μ λ λ²μΈμ΄μ νΉμ§μ κ°μ§λ©°, λ€λ₯Έ μΈμ΄ λ°μ΄ν°λ₯Ό ν¬ν¨μμΌ λ°μ΄ν°κ° λΆμ‘±ν νλ ¨ μ
μ 보μν μ μ μμ μμ¬νλ€.One of the earliest cues for neurological or degenerative disorders are speech impairments. Individuals with Parkinsons Disease, Cerebral Palsy, Amyotrophic lateral Sclerosis, Multiple Sclerosis among others are often diagnosed with dysarthria. Dysarthria is a group of speech disorders mainly affecting the articulatory muscles which eventually leads to severe misarticulation. However, impairments in the suprasegmental domain are also present and previous studies have shown that the prosodic patterns of speakers with dysarthria differ from the prosody of healthy speakers. In a clinical setting, a prosodic-based analysis of dysarthric speech can be helpful for diagnosing the presence of dysarthria. Therefore, there is a need to not only determine how the prosody of speech is affected by dysarthria, but also what aspects of prosody are more affected and how prosodic impairments change by the severity of dysarthria.
In the current study, several prosodic features related to pitch, voice quality, rhythm and speech rate are used as features for detecting dysarthria in a given speech signal. A variety of feature selection methods are utilized to determine which set of features are optimal for accurate detection. After selecting an optimal set of prosodic features we use them as input to machine learning-based classifiers and assess the performance using the evaluation metrics: accuracy, precision, recall and F1-score. Furthermore, we examine the usefulness of prosodic measures for assessing different levels of severity (e.g. mild, moderate, severe). Finally, as collecting impaired speech data can be difficult, we also implement cross-language classifiers where both Korean and English data are used for training but only one language used for testing. Results suggest that in comparison to solely using Mel-frequency cepstral coefficients, including prosodic measurements can improve the accuracy of classifiers for both Korean and English datasets. In particular, large improvements were seen when assessing different severity levels. For English a relative accuracy improvement of 1.82% for detection and 20.6% for assessment was seen. The Korean dataset saw no improvements for detection but a relative improvement of 13.6% for assessment. The results from cross-language experiments showed a relative improvement of up to 4.12% in comparison to only using a single language during training. It was found that certain prosodic impairments such as pitch and duration may be language independent. Therefore, when training sets of individual languages are limited, they may be supplemented by including data from other languages.1. Introduction 1
1.1. Dysarthria 1
1.2. Impaired Speech Detection 3
1.3. Research Goals & Outline 6
2. Background Research 8
2.1. Prosodic Impairments 8
2.1.1. English 8
2.1.2. Korean 10
2.2. Machine Learning Approaches 12
3. Database 18
3.1. English-TORGO 20
3.2. Korean-QoLT 21
4. Methods 23
4.1. Prosodic Features 23
4.1.1. Pitch 23
4.1.2. Voice Quality 26
4.1.3. Speech Rate 29
4.1.3. Rhythm 30
4.2. Feature Selection 34
4.3. Classification Models 38
4.3.1. Random Forest 38
4.3.1. Support Vector Machine 40
4.3.1 Feed-Forward Neural Network 42
4.4. Mel-Frequency Cepstral Coefficients 43
5. Experiment 46
5.1. Model Parameters 47
5.2. Training Procedure 48
5.2.1. Dysarthria Detection 48
5.2.2. Severity Assessment 50
5.2.3. Cross-Language 51
6. Results 52
6.1. TORGO 52
6.1.1. Dysarthria Detection 52
6.1.2. Severity Assessment 56
6.2. QoLT 57
6.2.1. Dysarthria Detection 57
6.2.2. Severity Assessment 58
6.1. Cross-Language 59
7. Discussion 62
7.1. Linguistic Implications 62
7.2. Clinical Applications 65
8. Conclusion 67
References 69
Appendix 76
Abstract in Korean 79Maste