28 research outputs found

    Current Topics in Technology-Enabled Stroke Rehabilitation and Reintegration: A Scoping Review and Content Analysis

    Get PDF
    Background. There is a worldwide health crisis stemming from the rising incidence of various debilitating chronic diseases, with stroke as a leading contributor. Chronic stroke management encompasses rehabilitation and reintegration, and can require decades of personalized medicine and care. Information technology (IT) tools have the potential to support individuals managing chronic stroke symptoms. Objectives. This scoping review identifies prevalent topics and concepts in research literature on IT technology for stroke rehabilitation and reintegration, utilizing content analysis, based on topic modelling techniques from natural language processing to identify gaps in this literature. Eligibility Criteria. Our methodological search initially identified over 14,000 publications of the last two decades in the Web of Science and Scopus databases, which we filter, using keywords and a qualitative review, to a core corpus of 1062 documents. Results. We generate a 3- topic, 4-topic and 5-topic model and interpret the resulting topics as four distinct thematics in the literature, which we label as Robotics, Software, Functional and Cognitive. We analyze the prevalence and distinctiveness of each thematic and identify some areas relatively neglected by the field. These are mainly in the Cognitive thematic, especially for systems and devices for sensory loss rehabilitation, tasks of daily living performance and social participation. Conclusion. The results indicate that IT-enabled stroke literature has focused on Functional outcomes and Robotic technologies, with lesser emphasis on Cognitive outcomes and combined interventions. We hope this review broadens awareness, usage and mainstream acceptance of novel technologies in rehabilitation and reintegration among clinicians, carers and patients

    Treatment of non-fluent aphasia through melody, rhythm and formulaic language

    No full text
    Left-hemisphere stroke patients often suffer a profound loss of spontaneous speech — known as non-fluent aphasia. Yet, many patients are still able to sing entire pieces of text fluently. This striking finding has inspired mainly two research questions. If the experimental design focuses on one point in time (cross section), one may ask whether or not singing facilitates speech production in aphasic patients. If the design focuses on changes over several points in time (longitudinal section), one may ask whether or not singing qualifies as a therapy to aid recovery from aphasia. The present work addresses both of these questions based on two separate experiments. A cross-sectional experiment investigated the relative effects of melody, rhythm, and lyric type on speech production in seventeen patients with non-fluent aphasia. The experiment controlled for vocal frequency variability, pitch accuracy, rhythmicity, syllable duration, phonetic complexity and other influences, such as learning effects and the acoustic setting. Contrary to earlier reports, the cross-sectional results suggest that singing may not benefit speech production in non-fluent aphasic patients over and above rhythmic speech. Previous divergent findings could very likely be due to affects from the acoustic setting, insufficient control for syllable duration, and language-specific stress patterns. However, the data reported here indicate that rhythmic pacing may be crucial, particularly for patients with lesions including the basal ganglia. Overall, basal ganglia lesions accounted for more than fifty percent of the variance related to rhythmicity. The findings suggest that benefits typically attributed to singing in the past may actually have their roots in rhythm. Moreover, the results demonstrate that lyric type may have a profound impact on speech production in non-fluent aphasic patients. Among the studied patients, lyric familiarity and formulaic language appeared to strongly mediate speech production, regardless of whether patients were singing or speaking rhythmically. Lyric familiarity and formulaic language may therefore help to explain effects that have, up until now, been presumed to result from singing. A longitudinal experiment investigated the relative long-term effects of melody and rhythm on the recovery of formulaic and non-formulaic speech. Fifteen patients with chronic non-fluent aphasia underwent either singing therapy, rhythmic therapy, or standard speech therapy. The experiment controlled for vocal frequency variability, phonatory quality, pitch accuracy, syllable duration, phonetic complexity and other influences, such as the acoustic setting and learning effects induced by the testing itself. The longitudinal results suggest that singing and rhythmic speech may be similarly effective in the treatment of non-fluent aphasia. Both singing and rhythmic therapy patients made good progress in the production of common, formulaic phrases — known to be supported by right corticostriatal brain areas. This progress occurred at an early stage of both therapies and was stable over time. Moreover, relatives of the patients reported that they were using a fixed number of formulaic phrases successfully in communicative contexts. Independent of whether patients had received singing or rhythmic therapy, they were able to easily switch between singing and rhythmic speech at any time. Conversely, patients receiving standard speech therapy made less progress in the production of formulaic phrases. They did, however, improve their production of unrehearsed, non-formulaic utterances, in contrast to singing and rhythmic therapy patients, who did not. In light of these results, it may be worth considering the combined use of standard speech therapy and the training of formulaic phrases, whether sung or rhythmically spoken. This combination may yield better results for speech recovery than either therapy alone. Overall, treatment and lyric type accounted for about ninety percent of the variance related to speech recovery in the data reported here. The present work delivers three main results. First, it may not be singing itself that aids speech production and speech recovery in non-fluent aphasic patients, but rhythm and lyric type. Second, the findings may challenge the view that singing causes a transfer of language function from the left to the right hemisphere. Moving beyond this left-right hemisphere dichotomy, the current results are consistent with the idea that rhythmic pacing may partly bypass corticostriatal damage. Third, the data support the claim that non-formulaic utterances and formulaic phrases rely on different neural mechanisms, suggesting a two-path model of speech recovery. Standard speech therapy focusing on non-formulaic, propositional utterances may engage, in particular, left perilesional brain regions, while training of formulaic phrases may open new ways of tapping into right-hemisphere language resources — even without singing

    SYNTHESIZING DYSARTHRIC SPEECH USING MULTI-SPEAKER TTS FOR DSYARTHRIC SPEECH RECOGNITION

    Get PDF
    Dysarthria is a motor speech disorder often characterized by reduced speech intelligibility through slow, uncoordinated control of speech production muscles. Automatic Speech recognition (ASR) systems may help dysarthric talkers communicate more effectively. However, robust dysarthria-specific ASR requires a significant amount of training speech is required, which is not readily available for dysarthric talkers. In this dissertation, we investigate dysarthric speech augmentation and synthesis methods. To better understand differences in prosodic and acoustic characteristics of dysarthric spontaneous speech at varying severity levels, a comparative study between typical and dysarthric speech was conducted. These characteristics are important components for dysarthric speech modeling, synthesis, and augmentation. For augmentation, prosodic transformation and time-feature masking have been proposed. For dysarthric speech synthesis, this dissertation has introduced a modified neural multi-talker TTS by adding a dysarthria severity level coefficient and a pause insertion model to synthesize dysarthric speech for varying severity levels. In addition, we have extended this work by using a label propagation technique to create more meaningful control variables such as a continuous Respiration, Laryngeal and Tongue (RLT) parameter, even for datasets that only provide discrete dysarthria severity level information. This approach increases the controllability of the system, so we are able to generate more dysarthric speech with a broader range. To evaluate their effectiveness for synthesis of training data, dysarthria-specific speech recognition was used. Results show that a DNN-HMM model trained on additional synthetic dysarthric speech achieves WER improvement of 12.2% compared to the baseline, and that the addition of the severity level and pause insertion controls decrease WER by 6.5%, showing the effectiveness of adding these parameters. Overall results on the TORGO database demonstrate that using dysarthric synthetic speech to increase the amount of dysarthric-patterned speech for training has a significant impact on the dysarthric ASR systems

    THE ROLE OF “FOCUS OF ATTENTION” ON THE LEARNING OF NON-NATIVE SPEECH SOUNDS: ENGLISH SPEAKERS LEARNING OF MANDARIN CHINESE TONES

    Get PDF
    Focus of attention (FOA) has been demonstrated to affect motor learning and performance of many motor skills. FOA refers to the performer’s focus while performing the task. The purpose of this dissertation was to assess the role of FOA in the speech domain. The research asked whether external or internal FOA would individually or differentially facilitate the learning of Mandarin Chinese tones by native English speakers. As a secondary question and experimental control, this study also examined whether the four tones were produced with the same accuracy. Forty-two females, between the ages of 18 and 24 were randomly assigned to one of three groups: external FOA (EFOA), internal FOA (IFOA) and control (C). During the acquisition phase, the groups were instructed to either focus on the sound produced (EFOA), the vibration in the voice box (IFOA), or no related FOA instructions (control). Participants were required to repeat the Mandarin words after an auditory model. To assess learning, the participants repeated the practiced words in a retention test, and repeated similar but unpracticed words during a transfer test. The data was collected in two sessions. The dependent variables were the root mean squared error (acoustic measure) and percentage of correctly perceived tones (perceptual measure). There was a significant difference among the four Mandarin Chinese tones for the three groups (Tones 1 and 4 were produced with significantly higher accuracy than Tones 2 and 3) before acquisition phase. There was, however, no significant difference among the three FOA groups on the dependent variables. The results contradict the FOA effects in the literature derived from limb motor learning and oral-nonspeech learning experiments. This study represents the first attempt to test the FOA in the speech domain. As such, it is premature to draw firm conclusions about the role of FOA in speech motor learning based on these results. The discussion focuses on factors that might have led to the current results. Because FOA represents a potential factor that might affect speech motor learning, future research is warranted to study the effect of FOA in the speech domain

    Subspace Gaussian Mixture Models for Language Identification and Dysarthric Speech Intelligibility Assessment

    Get PDF
    En esta Tesis se ha investigado la aplicación de técnicas de modelado de subespacios de mezclas de Gaussianas en dos problemas relacionados con las tecnologías del habla, como son la identificación automática de idioma (LID, por sus siglas en inglés) y la evaluación automática de inteligibilidad en el habla de personas con disartria. Una de las técnicas más importantes estudiadas es el análisis factorial conjunto (JFA, por sus siglas en inglés). JFA es, en esencia, un modelo de mezclas de Gaussianas en el que la media de cada componente se expresa como una suma de factores de dimensión reducida, y donde cada factor representa una contribución diferente a la señal de audio. Esta factorización nos permite compensar nuestros modelos frente a contribuciones indeseadas presentes en la señal, como la información de canal. JFA se ha investigado como clasficador y como extractor de parámetros. En esta última aproximación se modela un solo factor que representa todas las contribuciones presentes en la señal. Los puntos en este subespacio se denominan i-Vectors. Así, un i-Vector es un vector de baja dimensión que representa una grabación de audio. Los i-Vectors han resultado ser muy útiles como vector de características para representar señales en diferentes problemas relacionados con el aprendizaje de máquinas. En relación al problema de LID, se han investigado dos sistemas diferentes de acuerdo al tipo de información extraída de la señal. En el primero, la señal se parametriza en vectores acústicos con información espectral a corto plazo. En este caso, observamos mejoras de hasta un 50% con el sistema basado en i-Vectors respecto al sistema que utilizaba JFA como clasificador. Se comprobó que el subespacio de canal del modelo JFA también contenía información del idioma, mientras que con los i-Vectors no se descarta ningún tipo de información, y además, son útiles para mitigar diferencias entre los datos de entrenamiento y de evaluación. En la fase de clasificación, los i-Vectors de cada idioma se modelaron con una distribución Gaussiana en la que la matriz de covarianza era común para todos. Este método es simple y rápido, y no requiere de ningún post-procesado de los i-Vectors. En el segundo sistema, se introdujo el uso de información prosódica y formántica en un sistema de LID basado en i-Vectors. La precisión de éste estaba por debajo de la del sistema acústico. Sin embargo, los dos sistemas son complementarios, y se obtuvo hasta un 20% de mejora con la fusión de los dos respecto al sistema acústico solo. Tras los buenos resultados obtenidos para LID, y dado que, teóricamente, los i-Vectors capturan toda la información presente en la señal, decidimos usarlos para la evaluar de manera automática la inteligibilidad en el habla de personas con disartria. Los logopedas están muy interesados en esta tecnología porque permitiría evaluar a sus pacientes de una manera objetiva y consistente. En este caso, los i-Vectors se obtuvieron a partir de información espectral a corto plazo de la señal, y la inteligibilidad se calculó a partir de los i-Vectors obtenidos para un conjunto de palabras dichas por el locutor evaluado. Comprobamos que los resultados eran mucho mejores si en el entrenamiento del sistema se incorporaban datos de la persona que iba a ser evaluada. No obstante, esta limitación podría aliviarse utilizando una mayor cantidad de datos para entrenar el sistema.In this Thesis, we investigated how to effciently apply subspace Gaussian mixture modeling techniques onto two speech technology problems, namely automatic spoken language identification (LID) and automatic intelligibility assessment of dysarthric speech. One of the most important of such techniques in this Thesis was joint factor analysis (JFA). JFA is essentially a Gaussian mixture model where the mean of the components is expressed as a sum of low-dimension factors that represent different contributions to the speech signal. This factorization makes it possible to compensate for undesired sources of variability, like the channel. JFA was investigated as final classiffer and as feature extractor. In the latter approach, a single subspace including all sources of variability is trained, and points in this subspace are known as i-Vectors. Thus, one i-Vector is defined as a low-dimension representation of a single utterance, and they are a very powerful feature for different machine learning problems. We have investigated two different LID systems according to the type of features extracted from speech. First, we extracted acoustic features representing short-time spectral information. In this case, we observed relative improvements with i-Vectors with respect to JFA of up to 50%. We realized that the channel subspace in a JFA model also contains language information whereas i-Vectors do not discard any language information, and moreover, they help to reduce mismatches between training and testing data. For classification, we modeled the i-Vectors of each language with a Gaussian distribution with covariance matrix shared among languages. This method is simple and fast, and it worked well without any post-processing. Second, we introduced the use of prosodic and formant information with the i-Vectors system. The performance was below the acoustic system but both were found to be complementary and we obtained up to a 20% relative improvement with the fusion with respect to the acoustic system alone. Given the success in LID and the fact that i-Vectors capture all the information that is present in the data, we decided to use i-Vectors for other tasks, specifically, the assessment of speech intelligibility in speakers with different types of dysarthria. Speech therapists are very interested in this technology because it would allow them to objectively and consistently rate the intelligibility of their patients. In this case, the input features were extracted from short-term spectral information, and the intelligibility was assessed from the i-Vectors calculated from a set of words uttered by the tested speaker. We found that the performance was clearly much better if we had available data for training of the person that would use the application. We think that this limitation could be relaxed if we had larger databases for training. However, the recording process is not easy for people with disabilities, and it is difficult to obtain large datasets of dysarthric speakers open to the research community. Finally, the same system architecture for intelligibility assessment based on i-Vectors was used for predicting the accuracy that an automatic speech recognizer (ASR) system would obtain with dysarthric speakers. The only difference between both was the ground truth label set used for training. Predicting the performance response of an ASR system would increase the confidence of speech therapists in these systems and would diminish health related costs. The results were not as satisfactory as in the previous case, probably because an ASR is a complex system whose accuracy can be very difficult to be predicted only with acoustic information. Nonetheless, we think that we opened a door to an interesting research direction for the two problems

    Cross-Generic Dimension Of The Production Of Phonological Paraphasias And Neologisms By People With Aphasia

    Get PDF
    Taking into account the still pending problem of uniformity versus heterogeneity with which the phonological deficit manifests itself across various aphasia syndromes as well as the virtual absence of any cross-generic explorations of the quantitative and qualitative production patterns of phonological and neologistic paraphasias, we have set the goal of enriching the present-day body of aphasiologic and neurolinguistic knowledge with novel theoretical and practical insights by getting a number of relevant questions answered. These include the ones about the syndrome-universality versus specificity of phonological errors, the effect of discourse elicitation genre on the number of erroneous productions and the diversity of their categories alongside the effect of within-genre task complexity on the phonological output of Russian-speaking individuals diagnosed with five different types of aphasia. To accomplish our goal, we have conducted a rigorous quantitative and qualitative hierarchical cluster analysis of the phonological errors detected in the interview samples of 18 participants whose oral productive performance on the tasks belonging to four distinct discourse genres was recorded on a high-quality sound-recording device and transcribed using the combination of the Jefferson Transcription System and the International Phonetic Alphabet one. The results obtained demonstrate that the phonological error production patterns cannot be relied on in distinguishing various aphasia types. They also show that each discourse genre is marked by its own degree of mental processing complexity and is, thus, associated with a numerically distinct picture of errors. Moreover, the degree of task complexity has been found to be a matter of individual perception. Last but not least, the previous researchers’ findings pertaining to paraphasias have been compared to our data, and some of the earlier structural hypotheses have been unsupported. Our study is expected to be of great value and utility from the viewpoint of furthering the development of theoretical knowledge about the phonological breakdown in the language disorder under scrutiny specifically from the perspective of aphasics’ engagement in everyday discourse situations, refining the existing speech production models or developing new more realistic and viable ones, and generating ideas for practical solutions in speech-language pathology

    Models and Analysis of Vocal Emissions for Biomedical Applications

    Get PDF
    The International Workshop on Models and Analysis of Vocal Emissions for Biomedical Applications (MAVEBA) came into being in 1999 from the particularly felt need of sharing know-how, objectives and results between areas that until then seemed quite distinct such as bioengineering, medicine and singing. MAVEBA deals with all aspects concerning the study of the human voice with applications ranging from the neonate to the adult and elderly. Over the years the initial issues have grown and spread also in other aspects of research such as occupational voice disorders, neurology, rehabilitation, image and video analysis. MAVEBA takes place every two years always in Firenze, Italy. This edition celebrates twenty years of uninterrupted and succesfully research in the field of voice analysis

    Three-dimensional point-cloud room model in room acoustics simulations

    Get PDF

    Physical mechanisms may be as important as brain mechanisms in evolution of speech [Commentary on Ackerman, Hage, & Ziegler. Brain Mechanisms of acoustic communication in humans and nonhuman primates: an evolutionary perspective]

    No full text
    We present two arguments why physical adaptations for vocalization may be as important as neural adaptations. First, fine control over vocalization is not easy for physical reasons, and modern humans may be exceptional. Second, we present an example of a gorilla that shows rudimentary voluntary control over vocalization, indicating that some neural control is already shared with great apes
    corecore