4 research outputs found

    Modeling Sub-Band Information Through Discrete Wavelet Transform to Improve Intelligibility Assessment of Dysarthric Speech

    Get PDF
    The speech signal within a sub-band varies at a fine level depending on the type, and level of dysarthria. The Mel-frequency filterbank used in the computation process of cepstral coefficients smoothed out this fine level information in the higher frequency regions due to the larger bandwidth of filters. To capture the sub-band information, in this paper, four-level discrete wavelet transform (DWT) decomposition is firstly performed to decompose the input speech signal into approximation and detail coefficients, respectively, at each level. For a particular input speech signal, five speech signals representing different sub-bands are then reconstructed using inverse DWT (IDWT). The log filterbank energies are computed by analyzing the short-term discrete Fourier transform magnitude spectra of each reconstructed speech using a 30-channel Mel-filterbank. For each analysis frame, the log filterbank energies obtained across all reconstructed speech signals are pooled together, and discrete cosine transform is performed to represent the cepstral feature, here termed as discrete wavelet transform reconstructed (DWTR)- Mel frequency cepstral coefficient (MFCC). The i-vector based dysarthric level assessment system developed on the universal access speech corpus shows that the proposed DTWRMFCC feature outperforms the conventional MFCC and several other cepstral features reported for a similar task. The usages of DWTR- MFCC improve the detection accuracy rate (DAR) of the dysarthric level assessment system in the text and the speaker-independent test case to 60.094 % from 56.646 % MFCC baseline. Further analysis of the confusion matrices shows that confusion among different dysarthric classes is quite different for MFCC and DWTR-MFCC features. Motivated by this observation, a two-stage classification approach employing discriminating power of both kinds of features is proposed to improve the overall performance of the developed dysarthric level assessment system. The two-stage classification scheme further improves the DAR to 65.813 % in the text and speaker- independent test case

    운율 정보λ₯Ό μ΄μš©ν•œ λ§ˆλΉ„λ§μž₯μ•  μŒμ„± μžλ™ κ²€μΆœ 및 평가

    Get PDF
    ν•™μœ„λ…Όλ¬Έ (석사) -- μ„œμšΈλŒ€ν•™κ΅ λŒ€ν•™μ› : μΈλ¬ΈλŒ€ν•™ μ–Έμ–΄ν•™κ³Ό, 2020. 8. Minhwa Chung.말μž₯μ• λŠ” 신경계 λ˜λŠ” 퇴행성 μ§ˆν™˜μ—μ„œ κ°€μž₯ 빨리 λ‚˜νƒ€λ‚˜λŠ” 증 상 쀑 ν•˜λ‚˜μ΄λ‹€. λ§ˆλΉ„λ§μž₯μ• λŠ” νŒŒν‚¨μŠ¨λ³‘, λ‡Œμ„± λ§ˆλΉ„, κ·Όμœ„μΆ•μ„± μΈ‘μ‚­ 경화증, λ‹€λ°œμ„± 경화증 ν™˜μž λ“± λ‹€μ–‘ν•œ ν™˜μžκ΅°μ—μ„œ λ‚˜νƒ€λ‚œλ‹€. λ§ˆλΉ„λ§μž₯μ• λŠ” μ‘°μŒκΈ°κ΄€ μ‹ κ²½μ˜ μ†μƒμœΌλ‘œ λΆ€μ •ν™•ν•œ μ‘°μŒμ„ μ£Όμš” νŠΉμ§•μœΌλ‘œ 가지고, μš΄μœ¨μ—λ„ 영ν–₯을 λ―ΈμΉ˜λŠ” κ²ƒμœΌλ‘œ λ³΄κ³ λœλ‹€. μ„ ν–‰ μ—°κ΅¬μ—μ„œλŠ” 운율 기반 μΈ‘μ •μΉ˜λ₯Ό λΉ„μž₯μ•  λ°œν™”μ™€ λ§ˆλΉ„λ§μž₯μ•  λ°œν™”λ₯Ό κ΅¬λ³„ν•˜λŠ” 것에 μ‚¬μš©ν–ˆλ‹€. μž„μƒ ν˜„μž₯μ—μ„œλŠ” λ§ˆλΉ„λ§μž₯애에 λŒ€ν•œ 운율 기반 뢄석이 λ§ˆλΉ„λ§μž₯μ• λ₯Ό μ§„λ‹¨ν•˜κ±°λ‚˜ μž₯μ•  양상에 λ”°λ₯Έ μ•Œλ§žμ€ μΉ˜λ£Œλ²•μ„ μ€€λΉ„ν•˜λŠ” 것에 도움이 될 것이닀. λ”°λΌμ„œ λ§ˆλΉ„λ§μž₯μ• κ°€ μš΄μœ¨μ— 영ν–₯을 λ―ΈμΉ˜λŠ” μ–‘μƒλΏλ§Œ μ•„λ‹ˆλΌ λ§ˆλΉ„λ§μž₯μ• μ˜ 운율 νŠΉμ§•μ„ κΈ΄λ°€ν•˜κ²Œ μ‚΄νŽ΄λ³΄λŠ” 것이 ν•„μš”ν•˜λ‹€. ꡬ체 적으둜, 운율이 μ–΄λ–€ μΈ‘λ©΄μ—μ„œ λ§ˆλΉ„λ§μž₯애에 영ν–₯을 λ°›λŠ”μ§€, 그리고 운율 μ• κ°€ μž₯μ•  정도에 따라 μ–΄λ–»κ²Œ λ‹€λ₯΄κ²Œ λ‚˜νƒ€λ‚˜λŠ”μ§€μ— λŒ€ν•œ 뢄석이 ν•„μš”ν•˜λ‹€. λ³Έ 논문은 μŒλ†’μ΄, 음질, 말속도, 리듬 λ“± μš΄μœ¨μ„ λ‹€μ–‘ν•œ 츑면에 μ„œ μ‚΄νŽ΄λ³΄κ³ , λ§ˆλΉ„λ§μž₯μ•  κ²€μΆœ 및 평가에 μ‚¬μš©ν•˜μ˜€λ‹€. μΆ”μΆœλœ 운율 νŠΉμ§•λ“€μ€ λͺ‡ 가지 νŠΉμ§• 선택 μ•Œκ³ λ¦¬μ¦˜μ„ 톡해 μ΅œμ ν™”λ˜μ–΄ λ¨Έμ‹ λŸ¬λ‹ 기반 λΆ„λ₯˜κΈ°μ˜ μž…λ ₯κ°’μœΌλ‘œ μ‚¬μš©λ˜μ—ˆλ‹€. λΆ„λ₯˜κΈ°μ˜ μ„±λŠ₯은 정확도, 정밀도, μž¬ν˜„μœ¨, F1-점수둜 ν‰κ°€λ˜μ—ˆλ‹€. λ˜ν•œ, λ³Έ 논문은 μž₯μ•  쀑증도(경도, 쀑등도, 심도)에 따라 운율 정보 μ‚¬μš©μ˜ μœ μš©μ„±μ„ λΆ„μ„ν•˜μ˜€λ‹€. λ§ˆμ§€λ§‰μœΌλ‘œ, μž₯μ•  λ°œν™” μˆ˜μ§‘μ΄ μ–΄λ €μš΄ 만큼, λ³Έ μ—°κ΅¬λŠ” ꡐ차 μ–Έμ–΄ λΆ„λ₯˜κΈ°λ₯Ό μ‚¬μš©ν•˜μ˜€λ‹€. ν•œκ΅­μ–΄μ™€ μ˜μ–΄ μž₯μ•  λ°œν™”κ°€ ν›ˆλ ¨ μ…‹μœΌλ‘œ μ‚¬μš©λ˜μ—ˆμœΌλ©°, ν…ŒμŠ€νŠΈμ…‹μœΌλ‘œλŠ” 각 λͺ©ν‘œ μ–Έμ–΄λ§Œμ΄ μ‚¬μš©λ˜μ—ˆλ‹€. μ‹€ν—˜ κ²°κ³ΌλŠ” λ‹€μŒκ³Ό 같이 μ„Έ 가지λ₯Ό μ‹œμ‚¬ν•œλ‹€. 첫째, 운율 정보 λ₯Ό μ‚¬μš©ν•˜λŠ” 것은 λ§ˆλΉ„λ§μž₯μ•  κ²€μΆœ 및 평가에 도움이 λœλ‹€. MFCC λ§Œμ„ μ‚¬μš©ν–ˆμ„ λ•Œμ™€ λΉ„κ΅ν–ˆμ„ λ•Œ, 운율 정보λ₯Ό ν•¨κ»˜ μ‚¬μš©ν•˜λŠ” 것이 ν•œκ΅­μ–΄μ™€ μ˜μ–΄ 데이터셋 λͺ¨λ‘μ—μ„œ 도움이 λ˜μ—ˆλ‹€. λ‘˜μ§Έ, 운율 μ •λ³΄λŠ” 평가에 특히 μœ μš©ν•˜λ‹€. μ˜μ–΄μ˜ 경우 κ²€μΆœκ³Ό ν‰κ°€μ—μ„œ 각각 1.82%와 20.6%의 μƒλŒ€μ  정확도 ν–₯상을 λ³΄μ˜€λ‹€. ν•œκ΅­μ–΄μ˜ 경우 κ²€μΆœμ—μ„œλŠ” ν–₯상을 보이지 μ•Šμ•˜μ§€λ§Œ, ν‰κ°€μ—μ„œλŠ” 13.6%의 μƒλŒ€μ  ν–₯상이 λ‚˜νƒ€λ‚¬λ‹€. μ…‹μ§Έ, ꡐ차 μ–Έμ–΄ λΆ„λ₯˜κΈ°λŠ” 단일 μ–Έμ–΄ λΆ„λ₯˜κΈ°λ³΄λ‹€ ν–₯μƒλœ κ²°κ³Όλ₯Ό 보인닀. μ‹€ν—˜ κ²°κ³Ό ꡐ차언어 λΆ„λ₯˜κΈ°λŠ” 단일 μ–Έμ–΄ λΆ„λ₯˜κΈ°μ™€ λΉ„κ΅ν–ˆμ„ λ•Œ μƒλŒ€μ μœΌλ‘œ 4.12% 높은 정확도λ₯Ό λ³΄μ˜€λ‹€. 이것은 νŠΉμ • 운율 μž₯μ• λŠ” 범언어적 νŠΉμ§•μ„ 가지며, λ‹€λ₯Έ μ–Έμ–΄ 데이터λ₯Ό ν¬ν•¨μ‹œμΌœ 데이터가 λΆ€μ‘±ν•œ ν›ˆλ ¨ 셋을 보완할 수 있 μŒμ„ μ‹œμ‚¬ν•œλ‹€.One of the earliest cues for neurological or degenerative disorders are speech impairments. Individuals with Parkinsons Disease, Cerebral Palsy, Amyotrophic lateral Sclerosis, Multiple Sclerosis among others are often diagnosed with dysarthria. Dysarthria is a group of speech disorders mainly affecting the articulatory muscles which eventually leads to severe misarticulation. However, impairments in the suprasegmental domain are also present and previous studies have shown that the prosodic patterns of speakers with dysarthria differ from the prosody of healthy speakers. In a clinical setting, a prosodic-based analysis of dysarthric speech can be helpful for diagnosing the presence of dysarthria. Therefore, there is a need to not only determine how the prosody of speech is affected by dysarthria, but also what aspects of prosody are more affected and how prosodic impairments change by the severity of dysarthria. In the current study, several prosodic features related to pitch, voice quality, rhythm and speech rate are used as features for detecting dysarthria in a given speech signal. A variety of feature selection methods are utilized to determine which set of features are optimal for accurate detection. After selecting an optimal set of prosodic features we use them as input to machine learning-based classifiers and assess the performance using the evaluation metrics: accuracy, precision, recall and F1-score. Furthermore, we examine the usefulness of prosodic measures for assessing different levels of severity (e.g. mild, moderate, severe). Finally, as collecting impaired speech data can be difficult, we also implement cross-language classifiers where both Korean and English data are used for training but only one language used for testing. Results suggest that in comparison to solely using Mel-frequency cepstral coefficients, including prosodic measurements can improve the accuracy of classifiers for both Korean and English datasets. In particular, large improvements were seen when assessing different severity levels. For English a relative accuracy improvement of 1.82% for detection and 20.6% for assessment was seen. The Korean dataset saw no improvements for detection but a relative improvement of 13.6% for assessment. The results from cross-language experiments showed a relative improvement of up to 4.12% in comparison to only using a single language during training. It was found that certain prosodic impairments such as pitch and duration may be language independent. Therefore, when training sets of individual languages are limited, they may be supplemented by including data from other languages.1. Introduction 1 1.1. Dysarthria 1 1.2. Impaired Speech Detection 3 1.3. Research Goals & Outline 6 2. Background Research 8 2.1. Prosodic Impairments 8 2.1.1. English 8 2.1.2. Korean 10 2.2. Machine Learning Approaches 12 3. Database 18 3.1. English-TORGO 20 3.2. Korean-QoLT 21 4. Methods 23 4.1. Prosodic Features 23 4.1.1. Pitch 23 4.1.2. Voice Quality 26 4.1.3. Speech Rate 29 4.1.3. Rhythm 30 4.2. Feature Selection 34 4.3. Classification Models 38 4.3.1. Random Forest 38 4.3.1. Support Vector Machine 40 4.3.1 Feed-Forward Neural Network 42 4.4. Mel-Frequency Cepstral Coefficients 43 5. Experiment 46 5.1. Model Parameters 47 5.2. Training Procedure 48 5.2.1. Dysarthria Detection 48 5.2.2. Severity Assessment 50 5.2.3. Cross-Language 51 6. Results 52 6.1. TORGO 52 6.1.1. Dysarthria Detection 52 6.1.2. Severity Assessment 56 6.2. QoLT 57 6.2.1. Dysarthria Detection 57 6.2.2. Severity Assessment 58 6.1. Cross-Language 59 7. Discussion 62 7.1. Linguistic Implications 62 7.2. Clinical Applications 65 8. Conclusion 67 References 69 Appendix 76 Abstract in Korean 79Maste
    corecore