551 research outputs found

    Simulating dysarthric speech for training data augmentation in clinical speech applications

    Full text link
    Training machine learning algorithms for speech applications requires large, labeled training data sets. This is problematic for clinical applications where obtaining such data is prohibitively expensive because of privacy concerns or lack of access. As a result, clinical speech applications are typically developed using small data sets with only tens of speakers. In this paper, we propose a method for simulating training data for clinical applications by transforming healthy speech to dysarthric speech using adversarial training. We evaluate the efficacy of our approach using both objective and subjective criteria. We present the transformed samples to five experienced speech-language pathologists (SLPs) and ask them to identify the samples as healthy or dysarthric. The results reveal that the SLPs identify the transformed speech as dysarthric 65% of the time. In a pilot classification experiment, we show that by using the simulated speech samples to balance an existing dataset, the classification accuracy improves by about 10% after data augmentation.Comment: Will appear in Proc. of ICASSP 201

    Articulatory and bottleneck features for speaker-independent ASR of dysarthric speech

    Full text link
    The rapid population aging has stimulated the development of assistive devices that provide personalized medical support to the needies suffering from various etiologies. One prominent clinical application is a computer-assisted speech training system which enables personalized speech therapy to patients impaired by communicative disorders in the patient's home environment. Such a system relies on the robust automatic speech recognition (ASR) technology to be able to provide accurate articulation feedback. With the long-term aim of developing off-the-shelf ASR systems that can be incorporated in clinical context without prior speaker information, we compare the ASR performance of speaker-independent bottleneck and articulatory features on dysarthric speech used in conjunction with dedicated neural network-based acoustic models that have been shown to be robust against spectrotemporal deviations. We report ASR performance of these systems on two dysarthric speech datasets of different characteristics to quantify the achieved performance gains. Despite the remaining performance gap between the dysarthric and normal speech, significant improvements have been reported on both datasets using speaker-independent ASR architectures.Comment: to appear in Computer Speech & Language - https://doi.org/10.1016/j.csl.2019.05.002 - arXiv admin note: substantial text overlap with arXiv:1807.1094

    Non-speech oro-motor exercise use in acquired dysarthria management : regimes and rationales

    Get PDF
    Non-speech oro-motor exercises (NSOMExs) are described in speech and language therapy (SLT) manuals, and are thought to be much used in acquired dysarthria intervention, though there is no robust evidence of an influence on speech outcome. Opinions differ as to whether, and for which dysarthria presentations, NSOMExs are appropriate. The investigation sought to collect development phase data, in accordance with the Medical Research Council evaluation of complex interventions. The aims were to establish the extent of NSOMExs use in acquired disorders, the exercise regimes in use for dysarthria, with which dysarthric populations, and the anticipated clinical outcomes. A further aim was to determine the influencing rationales where NSOMExs were or were not used in dysarthria intervention. SLTs throughout Scotland, Wales and Northern Ireland, working with adult acquired dysarthria, were identified by their service heads. They received postal questionnaires comprising 21 closed and two open questions, covering respondent biographics, use of NSOMExs, anticipated clinical outcomes, and practice influencing rationales. One hundred and ninety one (56% response) completed questionnaires were returned. Eighty-one per cent of respondents used NSOMExs in dysarthria. There was no association with years of SLT experience. Those who used and those who did not use NSOMExs provided similar influencing rationales, including evidence from their own practice, and Higher Education Institute (HEI) teaching. More experienced SLTs were more likely than those more recently qualified to be guided by results from their own practice. Input from the attended HEI was more influential for those less experienced than for those more experienced. Clinical outcome aims were not confined to speech, but included also improvements in movement, sensory awareness, appearance, emotional status, dysphagia and drooling. NSOMExs were used with many neurological disorders, especially stroke, all dysarthria classes, especially flaccid, and all severity levels. Tongue and lip exercises were more frequent than face, jaw and soft palate. The most common regimes were 4-6 repetitions of each exercise, during three practice periods daily, each of 6-10 minutes. Conclusions & Implications: NSOMExs are a frequent component of dysarthria management in the UK devolved government countries. This confirmation, along with the details of SLT practice, provides a foundation for clinical research which will compare outcomes for people with dysarthria, whose management includes and does not include NSOMExs. SLT practice may be guided by evidence that speech outcome is or is not affected by NSOMExs

    Investigating the success factors of expert users to inform device development

    Get PDF
    Objective: Expert user testing is a well recognised tool within user experience and human computer interaction design. Within the domain of assistive technology device design, however, this technique seems to be little used. It is suggested that studying the success factors of expert assistive technology device users may provide a valuable source of data to inform development of assistive technology devices. This paper presents an example of this technique, within the context of a number of studies carried out by the authors, using the example of preliminary data from a study informing the development of an innovative Augmentative and Alternative Communication (AAC) device. Main Content: The paper presents a qualitative study whose objective was to influence the design and further development of an innovative voice-input voice-output communication aid (Vivoca) which has previously reached proof-of-concept stage. The Vivoca device is designed for people with dysarthria and this dictates a number of specific constraints and considerations. In order to understand how Vivoca could be designed to be used successfully by people with dysarthria, this study aimed to identify the factors associated with expert users' successful use of current AAC devices. In order to allow comparison, the study included users with some understandable speech and also those with no understandable speech. The study procedure was designed to provide a profile of participants' communication methods and to identify the factors that participants felt made their communication successful. Results: Preliminary results from the study (currently underway) are presented, including a qualitative analysis of interview data, and data profiling participants' communication methods and context. Initial data has highlighted the very specific requirements for a communication aid design for people with some understandable speech. Conclusion: Study of expert users may provide an effective tool to help inform assistive technology device development

    Consideration of the listener in the assessment and treatment of dysarthria

    Get PDF
    Traditionally, speech production deficits have been the focus of clinical practice and research in dysarthria. However, recent research has begun to examine the role of the listener in communication interaction. This article provides an overview of perceptual processing theory relevant to dysarthria. In addition, it discusses the relationship of current theoretical models of speech perception to the assessment and treatment of dysarthria. Finally, it provides insight into how this information may inform current clinical practices and future research in the field

    Improving the intelligibility of dysarthric speech using a time domain pitch synchronous-based approach

    Get PDF
    Dysarthria is a motor speech impairment that reduces the intelligibility of speech. Observations indicate that for different types of dysarthria, the fundamental frequency, intensity, and speech rate of speech are distinct from those of unimpaired speakers. Therefore, the proposed enhancement technique modifies these parameters so that they fall in the range for unimpaired speakers. The fundamental frequency and speech rate of dysarthric speech are modified using the time domain pitch synchronous overlap and add (TD-PSOLA) algorithm. Then its intensity is modified using the fast Fourier transform (FFT) and inverse fast Fourier transform (IFFT)-based approach. This technique is applied to impaired speech samples of ten dysarthric speakers. After enhancement, the intelligibility of impaired and enhanced dysarthric speech is evaluated. The change in the intelligibility of impaired and enhanced dysarthric speech is evaluated using the rating scale and word count methods. The improvement in intelligibility is significant for speakers whose original intelligibility was poor. In contrast, the improvement in intelligibility was minimal for speakers whose intelligibility was already high. According to the rating scale method, for diverse speakers, the change in intelligibility ranges from 9% to 53%. Whereas, according to the word count method, this change in intelligibility ranges from 0% to 53%

    The effectiveness of traditional methods and altered auditory feedback in improving speech rate and intelligibility in speakers with Parkinson's disease

    Get PDF
    Communication problems are a frequent symptom for people with Parkinson's disease (PD) which can have a significant impact on their quality-of-life. Deciding on the right management approach can be problematic though, as, with the exception of LSVT (R), very few studies have been published demonstrating the effectiveness of treatment techniques. The aim of this study was to compare traditional rate reduction methods with altered auditory feedback (AAF) with respect to their effectiveness to reduce speech rate and improve intelligibility in speakers with PD. Ten participants underwent both types of treatments in once weekly sessions for 6 weeks. Outcomes measures were speech rate for passage reading as well as intelligibility on both a passage reading and a monologue task. The results showed that, as a group, there was no significant change in either speech rate or intelligibility resulting from either treatment type. However, individual speakers showed improvements in speech performance as a result of each therapy technique. In most cases, these benefits persisted for at least 6 months post-treatment. Possible reasons for the variable response to treatment, as well as issues to consider when planning to use AAF devices in treatment are discussed

    SYNTHESIZING DYSARTHRIC SPEECH USING MULTI-SPEAKER TTS FOR DSYARTHRIC SPEECH RECOGNITION

    Get PDF
    Dysarthria is a motor speech disorder often characterized by reduced speech intelligibility through slow, uncoordinated control of speech production muscles. Automatic Speech recognition (ASR) systems may help dysarthric talkers communicate more effectively. However, robust dysarthria-specific ASR requires a significant amount of training speech is required, which is not readily available for dysarthric talkers. In this dissertation, we investigate dysarthric speech augmentation and synthesis methods. To better understand differences in prosodic and acoustic characteristics of dysarthric spontaneous speech at varying severity levels, a comparative study between typical and dysarthric speech was conducted. These characteristics are important components for dysarthric speech modeling, synthesis, and augmentation. For augmentation, prosodic transformation and time-feature masking have been proposed. For dysarthric speech synthesis, this dissertation has introduced a modified neural multi-talker TTS by adding a dysarthria severity level coefficient and a pause insertion model to synthesize dysarthric speech for varying severity levels. In addition, we have extended this work by using a label propagation technique to create more meaningful control variables such as a continuous Respiration, Laryngeal and Tongue (RLT) parameter, even for datasets that only provide discrete dysarthria severity level information. This approach increases the controllability of the system, so we are able to generate more dysarthric speech with a broader range. To evaluate their effectiveness for synthesis of training data, dysarthria-specific speech recognition was used. Results show that a DNN-HMM model trained on additional synthetic dysarthric speech achieves WER improvement of 12.2% compared to the baseline, and that the addition of the severity level and pause insertion controls decrease WER by 6.5%, showing the effectiveness of adding these parameters. Overall results on the TORGO database demonstrate that using dysarthric synthetic speech to increase the amount of dysarthric-patterned speech for training has a significant impact on the dysarthric ASR systems
    corecore