1,008 research outputs found

    Articulatory and bottleneck features for speaker-independent ASR of dysarthric speech

    Full text link
    The rapid population aging has stimulated the development of assistive devices that provide personalized medical support to the needies suffering from various etiologies. One prominent clinical application is a computer-assisted speech training system which enables personalized speech therapy to patients impaired by communicative disorders in the patient's home environment. Such a system relies on the robust automatic speech recognition (ASR) technology to be able to provide accurate articulation feedback. With the long-term aim of developing off-the-shelf ASR systems that can be incorporated in clinical context without prior speaker information, we compare the ASR performance of speaker-independent bottleneck and articulatory features on dysarthric speech used in conjunction with dedicated neural network-based acoustic models that have been shown to be robust against spectrotemporal deviations. We report ASR performance of these systems on two dysarthric speech datasets of different characteristics to quantify the achieved performance gains. Despite the remaining performance gap between the dysarthric and normal speech, significant improvements have been reported on both datasets using speaker-independent ASR architectures.Comment: to appear in Computer Speech & Language - https://doi.org/10.1016/j.csl.2019.05.002 - arXiv admin note: substantial text overlap with arXiv:1807.1094

    Brittany Bernal - Sensorimotor Adaptation of Speech Through a Virtually Shortened Vocal Tract

    Get PDF
    The broad objective of this line of research is to understand how auditory feedback manipulations may be used to elicit involuntary changes in speech articulation. We examine speech sensorimotor adaptation to supplement the development of speech rehabilitation applications that benefit from this learning phenomenon. By manipulating the acoustics of one’s auditory feedback, it is possible to elicit involuntary changes in speech articulation. We seek to understand how virtually manipulating participants’ perception of vowel space affects their speech movements by assessing acoustic variables such as formant frequency changes. Participants speak through a digital audio processing device that virtually alters the perceived size of their vocal tract. It is hypothesized that this modification to auditory feedback will facilitate adaptive changes in motor behavior as indicated by acoustic changes resulting from speech articulation. This study will determine how modifying the perception of vocal tract size affects articulatory behavior, indicated by changes in formant frequencies and changes in vowel space area. This work will also determine if and how the size of the virtual vowel space affects the magnitude and direction of sensorimotor adaptation for speech. The ultimate aim is to determine how important it is for the virtual vowel space to mimic the talker’s real vowel space, and whether or not perturbing the size of the perceived vowel space may facilitate or impede involuntary adaptive learning for speech. Sensorimotor Adaptation of Speech Through a Virtually Shortened Vocal Tract by Brittany Bernal is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.https://epublications.marquette.edu/mcnair_2014/1009/thumbnail.jp

    Dysprosody in Cantonese Parkinson's disease

    Get PDF
    Thesis (B.Sc)--University of Hong Kong, 2008.Includes bibliographical references (leaves 26-30).A dissertation submitted in partial fulfilment of the requirements for the Bachelor of Science (Speech and Hearing Sciences), The University of Hong Kong, June 30, 2008.Also available in print.published_or_final_versionSpeech and Hearing SciencesBachelorBachelor of Science in Speech and Hearing Science

    SYNTHESIZING DYSARTHRIC SPEECH USING MULTI-SPEAKER TTS FOR DSYARTHRIC SPEECH RECOGNITION

    Get PDF
    Dysarthria is a motor speech disorder often characterized by reduced speech intelligibility through slow, uncoordinated control of speech production muscles. Automatic Speech recognition (ASR) systems may help dysarthric talkers communicate more effectively. However, robust dysarthria-specific ASR requires a significant amount of training speech is required, which is not readily available for dysarthric talkers. In this dissertation, we investigate dysarthric speech augmentation and synthesis methods. To better understand differences in prosodic and acoustic characteristics of dysarthric spontaneous speech at varying severity levels, a comparative study between typical and dysarthric speech was conducted. These characteristics are important components for dysarthric speech modeling, synthesis, and augmentation. For augmentation, prosodic transformation and time-feature masking have been proposed. For dysarthric speech synthesis, this dissertation has introduced a modified neural multi-talker TTS by adding a dysarthria severity level coefficient and a pause insertion model to synthesize dysarthric speech for varying severity levels. In addition, we have extended this work by using a label propagation technique to create more meaningful control variables such as a continuous Respiration, Laryngeal and Tongue (RLT) parameter, even for datasets that only provide discrete dysarthria severity level information. This approach increases the controllability of the system, so we are able to generate more dysarthric speech with a broader range. To evaluate their effectiveness for synthesis of training data, dysarthria-specific speech recognition was used. Results show that a DNN-HMM model trained on additional synthetic dysarthric speech achieves WER improvement of 12.2% compared to the baseline, and that the addition of the severity level and pause insertion controls decrease WER by 6.5%, showing the effectiveness of adding these parameters. Overall results on the TORGO database demonstrate that using dysarthric synthetic speech to increase the amount of dysarthric-patterned speech for training has a significant impact on the dysarthric ASR systems

    ACOUSTIC SPEECH MARKERS FOR TRACKING CHANGES IN HYPOKINETIC DYSARTHRIA ASSOCIATED WITH PARKINSON’S DISEASE

    Get PDF
    Previous research has identified certain overarching features of hypokinetic dysarthria associated with Parkinson’s Disease and found it manifests differently between individuals. Acoustic analysis has often been used to find correlates of perceptual features for differential diagnosis. However, acoustic parameters that are robust for differential diagnosis may not be sensitive to tracking speech changes. Previous longitudinal studies have had limited sample sizes or variable lengths between data collection. This study focused on using acoustic correlates of perceptual features to identify acoustic markers able to track speech changes in people with Parkinson’s Disease (PwPD) over six months. The thesis presents how this study has addressed limitations of previous studies to make a novel contribution to current knowledge. Speech data was collected from 63 PwPD and 47 control speakers using an online podcast software at two time points, six months apart (T1 and T2). Recordings of a standard reading passage, minimal pairs, sustained phonation, and spontaneous speech were collected. Perceptual severity ratings were given by two speech and language therapists for T1 and T2, and acoustic parameters of voice, articulation and prosody were investigated. Two analyses were conducted: a) to identify which acoustic parameters can track perceptual speech changes over time and b) to identify which acoustic parameters can track changes in speech intelligibility over time. An additional attempt was made to identify if these parameters showed group differences for differential diagnosis between PwPD and control speakers at T1 and T2. Results showed that specific acoustic parameters in voice quality, articulation and prosody could differentiate between PwPD and controls, or detect speech changes between T1 and T2, but not both factors. However, specific acoustic parameters within articulation could detect significant group and speech change differences across T1 and T2. The thesis discusses these results, their implications, and the potential for future studies

    The effectiveness of traditional methods and altered auditory feedback in improving speech rate and intelligibility in speakers with Parkinson's disease

    Get PDF
    Communication problems are a frequent symptom for people with Parkinson's disease (PD) which can have a significant impact on their quality-of-life. Deciding on the right management approach can be problematic though, as, with the exception of LSVT (R), very few studies have been published demonstrating the effectiveness of treatment techniques. The aim of this study was to compare traditional rate reduction methods with altered auditory feedback (AAF) with respect to their effectiveness to reduce speech rate and improve intelligibility in speakers with PD. Ten participants underwent both types of treatments in once weekly sessions for 6 weeks. Outcomes measures were speech rate for passage reading as well as intelligibility on both a passage reading and a monologue task. The results showed that, as a group, there was no significant change in either speech rate or intelligibility resulting from either treatment type. However, individual speakers showed improvements in speech performance as a result of each therapy technique. In most cases, these benefits persisted for at least 6 months post-treatment. Possible reasons for the variable response to treatment, as well as issues to consider when planning to use AAF devices in treatment are discussed

    Developmental dysarthria in a young adult with cerebral palsy : a speech subsystems analysis

    Get PDF
    The speech of children with cerebral palsy (CP) and dysarthria is associated with limited breath control, voice quality changes and imprecise articulation. These problems can reduce speech intelligibility, which can act as a barrier to successful interactions. Whilst the impact of the speech problems is well recognised, research on the nature of the speech impairment is relatively limited. This study aims to provide a detailed description of the speech production abilities of a 16-year old boy with CP using a speech subsystems approach. It will examine which subsystems might be affected that could impact upon intelligibility in this speaker. To achieve this, various speech samples were analysed regarding a range of acoustic and linguistic parameters and subsequently compared to the performances of his typically developing twin brother. Results showed that changes in respiration, phonation and articulation may contribute to the intelligibility issues experienced by the speaker with CP

    Rhythmic performance in hypokinetic dysarthria : relationship between reading, spontaneous speech and diadochokinetic tasks

    Get PDF
    Purpose: This study aimed to investigate whether rhythm metrics are sensitive to change in speakers with mild hypokinetic dysarthria, whether such changes can be detected in reading and spontaneous speech, and whether diadochokinetic (DDK) performance relates to rhythmic properties of speech tasks. Method: Ten people with Parkinson’s Disease (PwPD) with mild hypokinetic dysarthria and ten healthy control speakers produced DDK repetitions, a reading passage and a spontaneous monologue. Articulation rate, as well as ten rhythm metrics were applied to the speech data. DDK performance was captured by mean, standard deviation (SD) and coefficient of variation (CoV) of syllable duration. Results: Group differences were apparent across both speech tasks, but mainly in spontaneous speech. The control speakers changed their rhythm performance between the two tasks, whereas the PwPD displayed a more constant behaviour. The correlation analysis of speech and DDK tasks resulted in few meaningful relationships. Conclusions: Rhythm metrics appeared to be sensitive to mild levels of impairment in PwPD. They are thus suitable for use as diagnostic or outcome measures. In addition, we demonstrated that conversational data can be used in the investigation of rhythm. Finally, the value of DDK tasks in predicting the rhythm performance during speech could not be demonstrated successfully

    Speech Prosody Across Stimulus Types for Individuals with Parkinson's Disease

    Get PDF
    Up to 89% of the individuals with Parkinson's disease (PD) experience speech problem over the course of the disease. Speech prosody and intelligibility are two of the most affected areas in hypokinetic dysarthria. However, assessment of these areas could potentially be problematic as speech prosody and intelligibility could be affected by the type of speech materials employed. Objective: To comparatively explore the effects of different types of speech stimulus on speech prosody and intelligibility in PD speakers. Methods: Speech prosody and intelligibility of two groups of individuals with varying degree of dysarthria resulting from PD was compared to that of a group of control speakers using sentence reading, passage reading and monologue. Acoustic analysis including measures on fundamental frequency (F0), intensity and speech rate was used to form a prosodic profile for each individual. Speech intelligibility was measured for the speakers with dysarthria using direct magnitude estimation. Results: Difference in F0 variability between the speakers with dysarthria and control speakers was only observed in sentence reading task. Difference in the average intensity level was observed for speakers with mild dysarthria to that of the control speakers. Additionally, there were stimulus effect on both intelligibility and prosodic profile. Conclusions: The prosodic profile of PD speakers was different from that of the control speakers in the more structured task, and lower intelligibility was found in less structured task. This highlighted the value of both structured and natural stimulus to evaluate speech production in PD speakers. 2015-IOS Press and the authors.casl5pub3998pub
    corecore