90 research outputs found

    The use of speech recognition technology by people living with Amyotrophic Lateral Sclerosis: a scoping review

    Get PDF
    More than 80% of people living with Amyotrophic Lateral Sclerosis (plwALS) develop difficulties with their speech, affecting communication, self-identity and quality of life. Automatic speech recognition technology (ASR) is becoming a common way to interact with a broad range of devices, to find information and control the environment. ASR can be problematic for people with acquired neurogenic motor speech difficulties (dysarthria). Given that the field is rapidly developing, a scoping review is warranted

    Evaluating Camera Mouse as a computer access system for augmentative and alternative communication in cerebral palsy: a case study

    Full text link
    PUPRPOSE: Individuals with disabilities, who do not have reliable motor control to manipulate a standard computer mouse, require alternate access methods for complete computer access and for communication as well. The Camera Mouse system visually tracks the movement of selected facial features using a camera to directly control the mouse pointer of a computer. Current research suggests that this system can successfully provide a means of computer access and communication for individuals with motor impairments. However, there are no existing data on the efficacy of the software’s communication output capabilities. The goal of this case study is to provide a comprehensive evaluation of Camera Mouse as a computer access method for Augmentative and Alternative Communication (AAC) for an individual with cerebral palsy, who prefers to use her unintelligible dysarthric speech to communicate her desires and thoughts despite having access to a traditional AAC system. METHOD: The current study compared the Camera Mouse system, the Tobii PCEye Mini (a popular commercially available eye tracking device) paired with speech generating technology, and natural speech using a variety of tasks in a single dysarthric speaker. Tasks consisted of two questionnaires designed to measure psychosocial impact and satisfaction with assistive technology, two sentence intelligibility tasks that were judged by 4 unfamiliar listeners, and two language samples designed to measure expressive language. Each task was completed three times—once for each communication modality in question: natural speech, Camera Mouse-to-speech system, and Tobii eye tracker-to- speech system. Participant responses were recorded and transcribed. RESULTS: Data were analyzed in terms of psychosocial effects, user satisfaction, communication efficiency (using intelligibility and rate), and various measures of expressive output ability, to determine which modality offered the highest communicative aptitude. Measures showed that when paired with an orthographic selection interface and speech-generating device, the Camera Mouse and Tobii eye tracker resulted in greatly increased intelligibility. However, natural speech was superior to assistive technology options in all other measures, including psychosocial impact, satisfaction, communication efficiency, and several expressive language components. Though results indicate that use of the Tobii eye tracker resulted in a slightly higher rate and intelligibility, the participant reported increased satisfaction and psychosocial impact when using the novel Camera Mouse access system. CONCLUSION: This study is the first to provide quantitative information regarding the efficiency, psychosocial impact, user satisfaction, and expressive language capabilities of Camera Mouse as a computer access system for AAC. This study shows promising results for Camera Mouse as a functional access system for individuals with disabilities and for future AAC applications as well.2018-08-28T00:00:00

    The effect of multitalker background noise on speech intelligibility in Parkinson\u27s disease and controls

    Get PDF
    This study investigated the effect of multi-talker background noise on speech intelligibility in participants with hypophonia due to Parkinson’s disease (PD). Ten individuals with PD and 10 geriatric controls were tested on four speech intelligibility tasks at the single word, sentence, and conversation level in various conditions of background noise. Listeners assessed speech intelligibility using word identification or orthographic transcription procedures. Results revealed non-significant differences between groups when intelligibility was assessed in no background noise. PD speech intelligibility decreased significantly relative to controls in the presence of background noise. A phonetic error analysis revealed a distinct error profile for PD speech in background noise. The four most frequent phonetic errors were glottal-null, consonant-null in final position, stop place of articulation, and initial position cluster-singleton. The results demonstrate that individuals with PD have significant and distinctive deficits in speech intelligibility and phonetic errors in the presence of background noise

    Improving Dysarthric Speech Recognition by Enriching Training Datasets

    Get PDF
    Dysarthria is a motor speech disorder that results from disruptions in the neuro-motor interface and is characterised by poor articulation of phonemes and hyper-nasality and is characteristically different from normal speech. Many modern automatic speech recognition systems focus on a narrow range of speech diversity therefore as a consequence of this they exclude a groups of speakers who deviate in aspects of gender, race, age and speech impairment when building training datasets. This study attempts to develop an automatic speech recognition system that deals with dysarthric speech with limited dysarthric speech data. Speech utterances collected from the TORGO database are used to conduct experiments on a wav2vec2.0 model only trained on the Librispeech 960h dataset to obtain a baseline performance of the word error rate (WER) when recognising dysarthric speech. A version of the Librispeech model fine-tuned on multi-language datasets was tested to see if it would improve accuracy and achieved a top reduction of 24.15% in the WER for one of the male dysarthric speakers in the dataset. Transfer learning with speech recognition models and preprocessing dysarthric speech to improve its intelligibility by using general adversarial networks were limited in their potential due to a lack of dysarthric speech dataset of adequate size to use these technologies. The main conclusion drawn from this study is that a large diverse dysarthric speech dataset comparable to the size of datasets used to train machine learning ASR systems like Librispeech,with different types of speech, scripted and unscripted, is required to improve performance.

    SYNTHESIZING DYSARTHRIC SPEECH USING MULTI-SPEAKER TTS FOR DSYARTHRIC SPEECH RECOGNITION

    Get PDF
    Dysarthria is a motor speech disorder often characterized by reduced speech intelligibility through slow, uncoordinated control of speech production muscles. Automatic Speech recognition (ASR) systems may help dysarthric talkers communicate more effectively. However, robust dysarthria-specific ASR requires a significant amount of training speech is required, which is not readily available for dysarthric talkers. In this dissertation, we investigate dysarthric speech augmentation and synthesis methods. To better understand differences in prosodic and acoustic characteristics of dysarthric spontaneous speech at varying severity levels, a comparative study between typical and dysarthric speech was conducted. These characteristics are important components for dysarthric speech modeling, synthesis, and augmentation. For augmentation, prosodic transformation and time-feature masking have been proposed. For dysarthric speech synthesis, this dissertation has introduced a modified neural multi-talker TTS by adding a dysarthria severity level coefficient and a pause insertion model to synthesize dysarthric speech for varying severity levels. In addition, we have extended this work by using a label propagation technique to create more meaningful control variables such as a continuous Respiration, Laryngeal and Tongue (RLT) parameter, even for datasets that only provide discrete dysarthria severity level information. This approach increases the controllability of the system, so we are able to generate more dysarthric speech with a broader range. To evaluate their effectiveness for synthesis of training data, dysarthria-specific speech recognition was used. Results show that a DNN-HMM model trained on additional synthetic dysarthric speech achieves WER improvement of 12.2% compared to the baseline, and that the addition of the severity level and pause insertion controls decrease WER by 6.5%, showing the effectiveness of adding these parameters. Overall results on the TORGO database demonstrate that using dysarthric synthetic speech to increase the amount of dysarthric-patterned speech for training has a significant impact on the dysarthric ASR systems

    The Effect of Formant Measurement Methods on Vowel Space in Patients with Parkinson\u27s Disease Before and After Voice Treatment

    Get PDF
    LSVT-LOUD® has been shown to improve phonatory quality in patients with PD. Previous studies have shown an increase in vowel space area following treatment, but questions remain regarding possible methodological issues in interaction with phonatory factors. This study addresses these questions by comparing multiple formant measurment methods and vowel space metrics. Ten participants were recorded on two separate days before and after treatment. Formants were measured using a human-guided reference (dubbed \u27HGIM\u27), LPC, and two forms of cepstrally-liftered spectrum. Multiple vowel space metrics including the vowel articulation index, F2i/F2u, area of the vowel quadrilateral, and vowel formant dispersion utilized both lax and corner vowels to explore vowel space changes. Analysis revealed no significant change in vowel space following LSVT. High variability in LPC with a fixed coefficient was noted. These results do not support previous claims of increased vowel space but suggest that formant measurement methods may influence results

    The Ability of Persons with Parkinson's Disease to Manipulate Vocal Intensity and Articulatory Precision in an Intra-Operative Setting

    Get PDF
    Parkinson’s disease is a degenerative neurological disease associated with decreased basal ganglia control circuit output, leading to decreased facilitation of cortical motor areas and subsequent motor impairments (Wichmann & DeLong, 1996). Motor impairments, including rigidity, bradykinesia, reduced range of motion and difficulty initiating movement, impact both respiratory function and speech in persons with Parkinson’s disease (PWPD), often leading to hypophonia and hypokinetic dysarthria (Darling & Huber, 2011). Hypokinetic dysarthria includes, among other characteristics, reduced loudness and imprecise articulation, and therefore reduced speech clarity. The purpose of this study was to determine if PWPD were able to manipulate speech intensity and articulatory precision in soft versus loud stimulus presentation conditions in an intra-operative environment. Articulatory precision was measured using the F2 ratio, based on the second formant values of the vowels /i/ and /u/ (Sapir, 2007). As /i/ is produced anteriorly in the oral cavity and /u/ is produced posteriorly, an increase in this ratio is anticipated to accompany greater articulatory precision. It was hypothesized that PWPD would be able to increase vocal intensity, which would result in larger F2 ratios. Participants consisted of 16 PWPD undergoing surgery for deep brain stimulation and simultaneous recording in the subthalamic nucleus and cortex. Participants repeated CVCVCV utterances presented auditorily at soft and loud levels. Acoustic signals were recorded and average vowel intensities and second formant values for /i/ and /u/ productions within each utterance were extracted. Second formant values were then used to calculate the F2 ratio for each utterance. Wilcoxon Signed-Rank Tests revealed that, while intensity significantly increased in the loud compared to the soft condition, the F2 ratio did not demonstrate this increase. Of particular interest, examination of individual participants revealed that 3 patients did not increase intensity in the loud stimulus condition. When only participants who increased intensity were included in subsequent analyses, the F2 ratio did demonstrate a significant increase in the loud stimulus condition. The current study demonstrates that, even with methodological differences as a result of the intra-operative environment, when patients are able to increase speech intensity, they also increase articulatory precision

    Dysarthric speech analysis and automatic recognition using phase based representations

    Get PDF
    Dysarthria is a neurological speech impairment which usually results in the loss of motor speech control due to muscular atrophy and poor coordination of articulators. Dysarthric speech is more difficult to model with machine learning algorithms, due to inconsistencies in the acoustic signal and to limited amounts of training data. This study reports a new approach for the analysis and representation of dysarthric speech, and applies it to improve ASR performance. The Zeros of Z-Transform (ZZT) are investigated for dysarthric vowel segments. It shows evidence of a phase-based acoustic phenomenon that is responsible for the way the distribution of zero patterns relate to speech intelligibility. It is investigated whether such phase-based artefacts can be systematically exploited to understand their association with intelligibility. A metric based on the phase slope deviation (PSD) is introduced that are observed in the unwrapped phase spectrum of dysarthric vowel segments. The metric compares the differences between the slopes of dysarthric vowels and typical vowels. The PSD shows a strong and nearly linear correspondence with the intelligibility of the speaker, and it is shown to hold for two separate databases of dysarthric speakers. A systematic procedure for correcting the underlying phase deviations results in a significant improvement in ASR performance for speakers with severe and moderate dysarthria. In addition, information encoded in the phase component of the Fourier transform of dysarthric speech is exploited in the group delay spectrum. Its properties are found to represent disordered speech more effectively than the magnitude spectrum. Dysarthric ASR performance was significantly improved using phase-based cepstral features in comparison to the conventional MFCCs. A combined approach utilising the benefits of PSD corrections and phase-based features was found to surpass all the previous performance on the UASPEECH database of dysarthric speech

    Speech perception under adverse conditions: Insights from behavioral, computational, and neuroscience research

    Get PDF
    Adult speech perception reflects the long-term regularities of the native language, but it is also flexible such that it accommodates and adapts to adverse listening conditions and short-term deviations from native-language norms. The purpose of this article is to examine how the broader neuroscience literature can inform and advance research efforts in understanding the neural basis of flexibility and adaptive plasticity in speech perception. Specifically, we highlight the potential role of learning algorithms that rely on prediction error signals and discuss specific neural structures that are likely to contribute to such learning. To this end, we review behavioral studies, computational accounts, and neuroimaging findings related to adaptive plasticity in speech perception. Already, a few studies have alluded to a potential role of these mechanisms in adaptive plasticity in speech perception. Furthermore, we consider research topics in neuroscience that offer insight into how perception can be adaptively tuned to short-term deviations while balancing the need to maintain stability in the perception of learned long-term regularities. Consideration of the application and limitations of these algorithms in characterizing flexible speech perception under adverse conditions promises to inform theoretical models of speech. © 2014 Guediche, Blumstein, Fiez and Holt
    corecore