8 research outputs found

    Acoustic speech markers for tracking changes in hypokinetic dysarthria associated with Parkinsonโ€™s Disease

    Get PDF
    Joan Ma - ORCID: 0000-0003-2051-8360 https://orcid.org/0000-0003-2051-8360https://icpla2023.at

    Modeling Sub-Band Information Through Discrete Wavelet Transform to Improve Intelligibility Assessment of Dysarthric Speech

    Get PDF
    The speech signal within a sub-band varies at a fine level depending on the type, and level of dysarthria. The Mel-frequency filterbank used in the computation process of cepstral coefficients smoothed out this fine level information in the higher frequency regions due to the larger bandwidth of filters. To capture the sub-band information, in this paper, four-level discrete wavelet transform (DWT) decomposition is firstly performed to decompose the input speech signal into approximation and detail coefficients, respectively, at each level. For a particular input speech signal, five speech signals representing different sub-bands are then reconstructed using inverse DWT (IDWT). The log filterbank energies are computed by analyzing the short-term discrete Fourier transform magnitude spectra of each reconstructed speech using a 30-channel Mel-filterbank. For each analysis frame, the log filterbank energies obtained across all reconstructed speech signals are pooled together, and discrete cosine transform is performed to represent the cepstral feature, here termed as discrete wavelet transform reconstructed (DWTR)- Mel frequency cepstral coefficient (MFCC). The i-vector based dysarthric level assessment system developed on the universal access speech corpus shows that the proposed DTWRMFCC feature outperforms the conventional MFCC and several other cepstral features reported for a similar task. The usages of DWTR- MFCC improve the detection accuracy rate (DAR) of the dysarthric level assessment system in the text and the speaker-independent test case to 60.094 % from 56.646 % MFCC baseline. Further analysis of the confusion matrices shows that confusion among different dysarthric classes is quite different for MFCC and DWTR-MFCC features. Motivated by this observation, a two-stage classification approach employing discriminating power of both kinds of features is proposed to improve the overall performance of the developed dysarthric level assessment system. The two-stage classification scheme further improves the DAR to 65.813 % in the text and speaker- independent test case

    ์šด์œจ ์ •๋ณด๋ฅผ ์ด์šฉํ•œ ๋งˆ๋น„๋ง์žฅ์•  ์Œ์„ฑ ์ž๋™ ๊ฒ€์ถœ ๋ฐ ํ‰๊ฐ€

    Get PDF
    ํ•™์œ„๋…ผ๋ฌธ (์„์‚ฌ) -- ์„œ์šธ๋Œ€ํ•™๊ต ๋Œ€ํ•™์› : ์ธ๋ฌธ๋Œ€ํ•™ ์–ธ์–ดํ•™๊ณผ, 2020. 8. Minhwa Chung.๋ง์žฅ์• ๋Š” ์‹ ๊ฒฝ๊ณ„ ๋˜๋Š” ํ‡ดํ–‰์„ฑ ์งˆํ™˜์—์„œ ๊ฐ€์žฅ ๋นจ๋ฆฌ ๋‚˜ํƒ€๋‚˜๋Š” ์ฆ ์ƒ ์ค‘ ํ•˜๋‚˜์ด๋‹ค. ๋งˆ๋น„๋ง์žฅ์• ๋Š” ํŒŒํ‚จ์Šจ๋ณ‘, ๋‡Œ์„ฑ ๋งˆ๋น„, ๊ทผ์œ„์ถ•์„ฑ ์ธก์‚ญ ๊ฒฝํ™”์ฆ, ๋‹ค๋ฐœ์„ฑ ๊ฒฝํ™”์ฆ ํ™˜์ž ๋“ฑ ๋‹ค์–‘ํ•œ ํ™˜์ž๊ตฐ์—์„œ ๋‚˜ํƒ€๋‚œ๋‹ค. ๋งˆ๋น„๋ง์žฅ์• ๋Š” ์กฐ์Œ๊ธฐ๊ด€ ์‹ ๊ฒฝ์˜ ์†์ƒ์œผ๋กœ ๋ถ€์ •ํ™•ํ•œ ์กฐ์Œ์„ ์ฃผ์š” ํŠน์ง•์œผ๋กœ ๊ฐ€์ง€๊ณ , ์šด์œจ์—๋„ ์˜ํ–ฅ์„ ๋ฏธ์น˜๋Š” ๊ฒƒ์œผ๋กœ ๋ณด๊ณ ๋œ๋‹ค. ์„ ํ–‰ ์—ฐ๊ตฌ์—์„œ๋Š” ์šด์œจ ๊ธฐ๋ฐ˜ ์ธก์ •์น˜๋ฅผ ๋น„์žฅ์•  ๋ฐœํ™”์™€ ๋งˆ๋น„๋ง์žฅ์•  ๋ฐœํ™”๋ฅผ ๊ตฌ๋ณ„ํ•˜๋Š” ๊ฒƒ์— ์‚ฌ์šฉํ–ˆ๋‹ค. ์ž„์ƒ ํ˜„์žฅ์—์„œ๋Š” ๋งˆ๋น„๋ง์žฅ์• ์— ๋Œ€ํ•œ ์šด์œจ ๊ธฐ๋ฐ˜ ๋ถ„์„์ด ๋งˆ๋น„๋ง์žฅ์• ๋ฅผ ์ง„๋‹จํ•˜๊ฑฐ๋‚˜ ์žฅ์•  ์–‘์ƒ์— ๋”ฐ๋ฅธ ์•Œ๋งž์€ ์น˜๋ฃŒ๋ฒ•์„ ์ค€๋น„ํ•˜๋Š” ๊ฒƒ์— ๋„์›€์ด ๋  ๊ฒƒ์ด๋‹ค. ๋”ฐ๋ผ์„œ ๋งˆ๋น„๋ง์žฅ์• ๊ฐ€ ์šด์œจ์— ์˜ํ–ฅ์„ ๋ฏธ์น˜๋Š” ์–‘์ƒ๋ฟ๋งŒ ์•„๋‹ˆ๋ผ ๋งˆ๋น„๋ง์žฅ์• ์˜ ์šด์œจ ํŠน์ง•์„ ๊ธด๋ฐ€ํ•˜๊ฒŒ ์‚ดํŽด๋ณด๋Š” ๊ฒƒ์ด ํ•„์š”ํ•˜๋‹ค. ๊ตฌ์ฒด ์ ์œผ๋กœ, ์šด์œจ์ด ์–ด๋–ค ์ธก๋ฉด์—์„œ ๋งˆ๋น„๋ง์žฅ์• ์— ์˜ํ–ฅ์„ ๋ฐ›๋Š”์ง€, ๊ทธ๋ฆฌ๊ณ  ์šด์œจ ์• ๊ฐ€ ์žฅ์•  ์ •๋„์— ๋”ฐ๋ผ ์–ด๋–ป๊ฒŒ ๋‹ค๋ฅด๊ฒŒ ๋‚˜ํƒ€๋‚˜๋Š”์ง€์— ๋Œ€ํ•œ ๋ถ„์„์ด ํ•„์š”ํ•˜๋‹ค. ๋ณธ ๋…ผ๋ฌธ์€ ์Œ๋†’์ด, ์Œ์งˆ, ๋ง์†๋„, ๋ฆฌ๋“ฌ ๋“ฑ ์šด์œจ์„ ๋‹ค์–‘ํ•œ ์ธก๋ฉด์— ์„œ ์‚ดํŽด๋ณด๊ณ , ๋งˆ๋น„๋ง์žฅ์•  ๊ฒ€์ถœ ๋ฐ ํ‰๊ฐ€์— ์‚ฌ์šฉํ•˜์˜€๋‹ค. ์ถ”์ถœ๋œ ์šด์œจ ํŠน์ง•๋“ค์€ ๋ช‡ ๊ฐ€์ง€ ํŠน์ง• ์„ ํƒ ์•Œ๊ณ ๋ฆฌ์ฆ˜์„ ํ†ตํ•ด ์ตœ์ ํ™”๋˜์–ด ๋จธ์‹ ๋Ÿฌ๋‹ ๊ธฐ๋ฐ˜ ๋ถ„๋ฅ˜๊ธฐ์˜ ์ž…๋ ฅ๊ฐ’์œผ๋กœ ์‚ฌ์šฉ๋˜์—ˆ๋‹ค. ๋ถ„๋ฅ˜๊ธฐ์˜ ์„ฑ๋Šฅ์€ ์ •ํ™•๋„, ์ •๋ฐ€๋„, ์žฌํ˜„์œจ, F1-์ ์ˆ˜๋กœ ํ‰๊ฐ€๋˜์—ˆ๋‹ค. ๋˜ํ•œ, ๋ณธ ๋…ผ๋ฌธ์€ ์žฅ์•  ์ค‘์ฆ๋„(๊ฒฝ๋„, ์ค‘๋“ฑ๋„, ์‹ฌ๋„)์— ๋”ฐ๋ผ ์šด์œจ ์ •๋ณด ์‚ฌ์šฉ์˜ ์œ ์šฉ์„ฑ์„ ๋ถ„์„ํ•˜์˜€๋‹ค. ๋งˆ์ง€๋ง‰์œผ๋กœ, ์žฅ์•  ๋ฐœํ™” ์ˆ˜์ง‘์ด ์–ด๋ ค์šด ๋งŒํผ, ๋ณธ ์—ฐ๊ตฌ๋Š” ๊ต์ฐจ ์–ธ์–ด ๋ถ„๋ฅ˜๊ธฐ๋ฅผ ์‚ฌ์šฉํ•˜์˜€๋‹ค. ํ•œ๊ตญ์–ด์™€ ์˜์–ด ์žฅ์•  ๋ฐœํ™”๊ฐ€ ํ›ˆ๋ จ ์…‹์œผ๋กœ ์‚ฌ์šฉ๋˜์—ˆ์œผ๋ฉฐ, ํ…Œ์ŠคํŠธ์…‹์œผ๋กœ๋Š” ๊ฐ ๋ชฉํ‘œ ์–ธ์–ด๋งŒ์ด ์‚ฌ์šฉ๋˜์—ˆ๋‹ค. ์‹คํ—˜ ๊ฒฐ๊ณผ๋Š” ๋‹ค์Œ๊ณผ ๊ฐ™์ด ์„ธ ๊ฐ€์ง€๋ฅผ ์‹œ์‚ฌํ•œ๋‹ค. ์ฒซ์งธ, ์šด์œจ ์ •๋ณด ๋ฅผ ์‚ฌ์šฉํ•˜๋Š” ๊ฒƒ์€ ๋งˆ๋น„๋ง์žฅ์•  ๊ฒ€์ถœ ๋ฐ ํ‰๊ฐ€์— ๋„์›€์ด ๋œ๋‹ค. MFCC ๋งŒ์„ ์‚ฌ์šฉํ–ˆ์„ ๋•Œ์™€ ๋น„๊ตํ–ˆ์„ ๋•Œ, ์šด์œจ ์ •๋ณด๋ฅผ ํ•จ๊ป˜ ์‚ฌ์šฉํ•˜๋Š” ๊ฒƒ์ด ํ•œ๊ตญ์–ด์™€ ์˜์–ด ๋ฐ์ดํ„ฐ์…‹ ๋ชจ๋‘์—์„œ ๋„์›€์ด ๋˜์—ˆ๋‹ค. ๋‘˜์งธ, ์šด์œจ ์ •๋ณด๋Š” ํ‰๊ฐ€์— ํŠนํžˆ ์œ ์šฉํ•˜๋‹ค. ์˜์–ด์˜ ๊ฒฝ์šฐ ๊ฒ€์ถœ๊ณผ ํ‰๊ฐ€์—์„œ ๊ฐ๊ฐ 1.82%์™€ 20.6%์˜ ์ƒ๋Œ€์  ์ •ํ™•๋„ ํ–ฅ์ƒ์„ ๋ณด์˜€๋‹ค. ํ•œ๊ตญ์–ด์˜ ๊ฒฝ์šฐ ๊ฒ€์ถœ์—์„œ๋Š” ํ–ฅ์ƒ์„ ๋ณด์ด์ง€ ์•Š์•˜์ง€๋งŒ, ํ‰๊ฐ€์—์„œ๋Š” 13.6%์˜ ์ƒ๋Œ€์  ํ–ฅ์ƒ์ด ๋‚˜ํƒ€๋‚ฌ๋‹ค. ์…‹์งธ, ๊ต์ฐจ ์–ธ์–ด ๋ถ„๋ฅ˜๊ธฐ๋Š” ๋‹จ์ผ ์–ธ์–ด ๋ถ„๋ฅ˜๊ธฐ๋ณด๋‹ค ํ–ฅ์ƒ๋œ ๊ฒฐ๊ณผ๋ฅผ ๋ณด์ธ๋‹ค. ์‹คํ—˜ ๊ฒฐ๊ณผ ๊ต์ฐจ์–ธ์–ด ๋ถ„๋ฅ˜๊ธฐ๋Š” ๋‹จ์ผ ์–ธ์–ด ๋ถ„๋ฅ˜๊ธฐ์™€ ๋น„๊ตํ–ˆ์„ ๋•Œ ์ƒ๋Œ€์ ์œผ๋กœ 4.12% ๋†’์€ ์ •ํ™•๋„๋ฅผ ๋ณด์˜€๋‹ค. ์ด๊ฒƒ์€ ํŠน์ • ์šด์œจ ์žฅ์• ๋Š” ๋ฒ”์–ธ์–ด์  ํŠน์ง•์„ ๊ฐ€์ง€๋ฉฐ, ๋‹ค๋ฅธ ์–ธ์–ด ๋ฐ์ดํ„ฐ๋ฅผ ํฌํ•จ์‹œ์ผœ ๋ฐ์ดํ„ฐ๊ฐ€ ๋ถ€์กฑํ•œ ํ›ˆ๋ จ ์…‹์„ ๋ณด์™„ํ•  ์ˆ˜ ์žˆ ์Œ์„ ์‹œ์‚ฌํ•œ๋‹ค.One of the earliest cues for neurological or degenerative disorders are speech impairments. Individuals with Parkinsons Disease, Cerebral Palsy, Amyotrophic lateral Sclerosis, Multiple Sclerosis among others are often diagnosed with dysarthria. Dysarthria is a group of speech disorders mainly affecting the articulatory muscles which eventually leads to severe misarticulation. However, impairments in the suprasegmental domain are also present and previous studies have shown that the prosodic patterns of speakers with dysarthria differ from the prosody of healthy speakers. In a clinical setting, a prosodic-based analysis of dysarthric speech can be helpful for diagnosing the presence of dysarthria. Therefore, there is a need to not only determine how the prosody of speech is affected by dysarthria, but also what aspects of prosody are more affected and how prosodic impairments change by the severity of dysarthria. In the current study, several prosodic features related to pitch, voice quality, rhythm and speech rate are used as features for detecting dysarthria in a given speech signal. A variety of feature selection methods are utilized to determine which set of features are optimal for accurate detection. After selecting an optimal set of prosodic features we use them as input to machine learning-based classifiers and assess the performance using the evaluation metrics: accuracy, precision, recall and F1-score. Furthermore, we examine the usefulness of prosodic measures for assessing different levels of severity (e.g. mild, moderate, severe). Finally, as collecting impaired speech data can be difficult, we also implement cross-language classifiers where both Korean and English data are used for training but only one language used for testing. Results suggest that in comparison to solely using Mel-frequency cepstral coefficients, including prosodic measurements can improve the accuracy of classifiers for both Korean and English datasets. In particular, large improvements were seen when assessing different severity levels. For English a relative accuracy improvement of 1.82% for detection and 20.6% for assessment was seen. The Korean dataset saw no improvements for detection but a relative improvement of 13.6% for assessment. The results from cross-language experiments showed a relative improvement of up to 4.12% in comparison to only using a single language during training. It was found that certain prosodic impairments such as pitch and duration may be language independent. Therefore, when training sets of individual languages are limited, they may be supplemented by including data from other languages.1. Introduction 1 1.1. Dysarthria 1 1.2. Impaired Speech Detection 3 1.3. Research Goals & Outline 6 2. Background Research 8 2.1. Prosodic Impairments 8 2.1.1. English 8 2.1.2. Korean 10 2.2. Machine Learning Approaches 12 3. Database 18 3.1. English-TORGO 20 3.2. Korean-QoLT 21 4. Methods 23 4.1. Prosodic Features 23 4.1.1. Pitch 23 4.1.2. Voice Quality 26 4.1.3. Speech Rate 29 4.1.3. Rhythm 30 4.2. Feature Selection 34 4.3. Classification Models 38 4.3.1. Random Forest 38 4.3.1. Support Vector Machine 40 4.3.1 Feed-Forward Neural Network 42 4.4. Mel-Frequency Cepstral Coefficients 43 5. Experiment 46 5.1. Model Parameters 47 5.2. Training Procedure 48 5.2.1. Dysarthria Detection 48 5.2.2. Severity Assessment 50 5.2.3. Cross-Language 51 6. Results 52 6.1. TORGO 52 6.1.1. Dysarthria Detection 52 6.1.2. Severity Assessment 56 6.2. QoLT 57 6.2.1. Dysarthria Detection 57 6.2.2. Severity Assessment 58 6.1. Cross-Language 59 7. Discussion 62 7.1. Linguistic Implications 62 7.2. Clinical Applications 65 8. Conclusion 67 References 69 Appendix 76 Abstract in Korean 79Maste

    Improving Dysarthric Speech Recognition by Enriching Training Datasets

    Get PDF
    Dysarthria is a motor speech disorder that results from disruptions in the neuro-motor interface and is characterised by poor articulation of phonemes and hyper-nasality and is characteristically different from normal speech. Many modern automatic speech recognition systems focus on a narrow range of speech diversity therefore as a consequence of this they exclude a groups of speakers who deviate in aspects of gender, race, age and speech impairment when building training datasets. This study attempts to develop an automatic speech recognition system that deals with dysarthric speech with limited dysarthric speech data. Speech utterances collected from the TORGO database are used to conduct experiments on a wav2vec2.0 model only trained on the Librispeech 960h dataset to obtain a baseline performance of the word error rate (WER) when recognising dysarthric speech. A version of the Librispeech model fine-tuned on multi-language datasets was tested to see if it would improve accuracy and achieved a top reduction of 24.15% in the WER for one of the male dysarthric speakers in the dataset. Transfer learning with speech recognition models and preprocessing dysarthric speech to improve its intelligibility by using general adversarial networks were limited in their potential due to a lack of dysarthric speech dataset of adequate size to use these technologies. The main conclusion drawn from this study is that a large diverse dysarthric speech dataset comparable to the size of datasets used to train machine learning ASR systems like Librispeech,with different types of speech, scripted and unscripted, is required to improve performance.

    Dรผsartriaga tรคiskasvanute kรตne hindamine ja kรตne tunnused: videopรตhine รตppematerjal

    Get PDF
    https://www.ester.ee/record=b5255669*es

    A description of the rhythm of Barunga Kriol using rhythm metrics and an analysis of vowel reduction

    Get PDF
    Kriol is an English-lexifier creole language spoken by over 20,000 children and adults in the Northern parts of Australia, yet much about the prosody of this language remains unknown. This thesis provides a preliminary description of the rhythm and patterns of vowel reduction of Barunga Kriol - a variety of Kriol local to Barunga Community, NT โ€“ and compares it to a relatively standard variety of Australian English. The thesis is divided into two studies. Study 1, the Rhythm Metric Study, describes the rhythm of Barunga Kriol and Australian English using rhythm metrics. Study 2, the Vowel Reduction Study, compared patterns of vowel reduction in Barunga Kriol and Australian English. This thesis contributes the first in depth studies of vowel reduction patterns and rhythm using rhythm metrics in any variety of Kriol or Australian English. The research also sets an adult baseline for metric results and patterns of vowel reduction for Barunga Kriol and Australian English, useful for future studies of child speech in these varieties. As rhythm is a major contributor to intelligibility, the findings of this thesis have the potential to inform teaching practice in English as a Second Language

    ACOUSTIC SPEECH MARKERS FOR TRACKING CHANGES IN HYPOKINETIC DYSARTHRIA ASSOCIATED WITH PARKINSONโ€™S DISEASE

    Get PDF
    Previous research has identified certain overarching features of hypokinetic dysarthria associated with Parkinsonโ€™s Disease and found it manifests differently between individuals. Acoustic analysis has often been used to find correlates of perceptual features for differential diagnosis. However, acoustic parameters that are robust for differential diagnosis may not be sensitive to tracking speech changes. Previous longitudinal studies have had limited sample sizes or variable lengths between data collection. This study focused on using acoustic correlates of perceptual features to identify acoustic markers able to track speech changes in people with Parkinsonโ€™s Disease (PwPD) over six months. The thesis presents how this study has addressed limitations of previous studies to make a novel contribution to current knowledge. Speech data was collected from 63 PwPD and 47 control speakers using an online podcast software at two time points, six months apart (T1 and T2). Recordings of a standard reading passage, minimal pairs, sustained phonation, and spontaneous speech were collected. Perceptual severity ratings were given by two speech and language therapists for T1 and T2, and acoustic parameters of voice, articulation and prosody were investigated. Two analyses were conducted: a) to identify which acoustic parameters can track perceptual speech changes over time and b) to identify which acoustic parameters can track changes in speech intelligibility over time. An additional attempt was made to identify if these parameters showed group differences for differential diagnosis between PwPD and control speakers at T1 and T2. Results showed that specific acoustic parameters in voice quality, articulation and prosody could differentiate between PwPD and controls, or detect speech changes between T1 and T2, but not both factors. However, specific acoustic parameters within articulation could detect significant group and speech change differences across T1 and T2. The thesis discusses these results, their implications, and the potential for future studies
    corecore