7 research outputs found

    Estimating tremor in Vocal Fold Biomechanics for Neurological Disease characterisation

    Get PDF
    Neurological Diseases (ND) are affecting larger segments of aging population every year. Treatment is dependent on expensive accurate and frequent monitoring. It is well known that ND leave correlates in speech and phonation. The present work shows a method to detect alterations in vocal fold tension during phonation. These may appear either as hypertension or as cyclical tremor. Estimations of tremor may be produced by auto-regressive modeling of the vocal fold tension series in sustained phonation. The correlates obtained are a set of cyclicality coefficients, the frequency and the root mean square amplitude of the tremor. Statistical distributions of these correlates obtained from a set of male and female subjects are presented. Results from five study cases of female voice are also given

    Automatic speech intelligibility detection for speakers with speech impairments: the identification of significant speech features

    Get PDF
    Selection of relevant features is important for discriminating speech in detection based ASR system, thus contributing to the improved performance of the detector. In the context of speech impairments, speech errors can be discriminated from regular speech by adopting the appropriate discriminative speech features with high discriminative ability between the impaired and the control group. However, identification of suitable discriminative speech features for error detection in impaired speech was not well investigated in the literature. Characteristics of impaired speech are grossly different from regular speech, thus making the existing speech features to be less effective in recognizing the impaired speech. To overcome this gap, the speech features of impaired speech based on the prosody, pronunciation and voice quality are analyzed for identifying the significant speech features which are related to the intelligibility deficits. In this research, we investigate the relations of speech impairments due to cerebral palsy, and hearing impairment with the prosody, pronunciation, and voice quality. Later, we identify the relationship of the speech features with the speech intelligibility classification and the significant speech features in improving the discriminative ability of an automatic speech intelligibility detection system. The findings showed that prosody, pronunciation and voice quality features are statistically significant speech features for improving the detection ability of impaired speeches. Voice quality is identified as the best speech features with more discriminative power in detecting speech intelligibility of impaired speech

    Detección automática de la enfermedad de Parkinson usando componentes moduladoras de señales de voz

    Get PDF
    Parkinson’s Disease (PD) is the second most common neurodegenerative disorder after Alzheimer’s disease. This disorder mainly affects older adults at a rate of about 2%, and about 89% of people diagnosed with PD also develop speech disorders. This has led scientific community to research information embedded in speech signal from Parkinson’s patients, which has allowed not only a diagnosis of the pathology but also a follow-up of its evolution. In recent years, a large number of studies have focused on the automatic detection of pathologies related to the voice, in order to make objective evaluations of the voice in a non-invasive manner. In cases where the pathology primarily affects the vibratory patterns of vocal folds such as Parkinson’s, the analyses typically performed are sustained over vowel pronunciations. In this article, it is proposed to use information from slow and rapid variations in speech signals, also known as modulating components, combined with an effective dimensionality reduction approach that will be used as input to the classification system. The proposed approach achieves classification rates higher than 88  %, surpassing the classical approach based on Mel Cepstrals Coefficients (MFCC). The results show that the information extracted from slow varying components is highly discriminative for the task at hand, and could support assisted diagnosis systems for PD.La Enfermedad de Parkinson (EP) es el segundo trastorno neurodegenerativo más común después de la enfermedad de Alzheimer. Este trastorno afecta principalmente a los adultos mayores con una tasa de aproximadamente el 2%, y aproximadamente el 89% de las personas diagnosticadas con EP también desarrollan trastornos del habla. Esto ha llevado a la comunidad científica a investigar información embebida en las señales de voz de pacientes diagnosticados con la EP, lo que ha permitido no solo un diagnóstico de la patología sino también un seguimiento de su evolución. En los últimos años, una gran cantidad de estudios se han centrado en la detección automática de patologías relacionadas con la voz, a fin de realizar evaluaciones objetivas de manera no invasiva. En los casos en que la patología afecta principalmente los patrones vibratorios de las cuerdas vocales como el Parkinson, los análisis que se realizan típicamente sobre grabaciones de vocales sostenidas. En este artículo, se propone utilizar información de componentes con variación lenta de las señales de voz, también conocidas como componentes de modulación, combinadas con un enfoque efectivo de reducción de dimensiónalidad que se utilizará como entrada al sistema de clasificación. El enfoque propuesto logra tasas de clasificación superiores al 88  %, superando el enfoque clásico basado en los Coeficientes Cepstrales de Mel (MFCC). Los resultados muestran que la información extraída de componentes que varían lentamente es altamente discriminatoria para el problema abordado y podría apoyar los sistemas de diagnóstico asistido para EP

    Automatic detection of Parkinson's disease from components of modulators in speech signals

    Get PDF
    Parkinson's disease (PD) is the second most common neurodegenerative disorder after Alzheimer's disease. This disorder mainly affects older adults at a rate of about 2%, and about 89% of people diagnosed with PD also develop speech disorders. This has led scientific community to research information embedded in speech signal from Parkinson's patients, which has allowed not only a diagnosis of the pathology but also a follow-up of its evolution. In recent years, a large number of studies have focused on the automatic detection of pathologies related to the voice, in order to make objective evaluations of the voice in a non-invasive manner. In cases where the pathology primarily affects the vibratory patterns of vocal folds such as Parkinson's, the analyses typically performed are sustained over vowel pronunciations. In this article, it is proposed to use information from slow and rapid variations in speech signals, also known as modulating components, combined with an effective dimensionality reduction reduction approach that will be used as input to the classification system. The proposed approach achieves classification rates higher than 88%, surpassing the classical approach based on mel cepstrals coefficients (MFCC). The results show that the information extracted from slow varying components is highly discriminative for the task at hand, and could support assisted diagnosis systems for PD.La enfermedad de Parkinson (EP) es el segundo trastorno neurodegenerativo más común después de la enfermedad de Alzheimer. Este trastorno afecta principalmente a los adultos mayores con una tasa de aproximadamente el 2%, y aproximadamente el 89% de las personas diagnosticadas con EP también desarrollan trastornos del habla. Esto ha llevado a la comunidad científica a investigar información embebida en las señales de voz de pacientes diagnosticados con la EP, lo que ha permitido no solo un diagnóstico de la patología sino también un seguimiento de su evolución. En los últimos años, una gran cantidad de estudios se han centrado en la detección automática de patologías relacionadas con la voz, a fin de realizar evaluaciones objetivas de manera no invasiva. En los casos en que la patología afecta principalmente los patrones vibratorios de las cuerdas vocales como el Parkinson, los análisis que se realizan típicamente sobre grabaciones de vocales sostenidas. En este artículo, se propone utilizar información de componentes con variación lenta de las señales de voz, también conocidas como componentes de modulación, combinadas con un enfoque efectivo de reducción de dimensiónalidad que se utilizará como entrada al sistema de clasificación. El enfoque propuesto logra tasas de clasificación superiores al 88%, superando el enfoque clásico basado en los coeficientes cepstrales de mel (MFCC). Los resultados muestran que la información extraída de componentes que varían lentamente es altamente discriminatoria para el problema abordado y podría apoyar los sistemas de diagnóstico asistido para EP

    A computational model of the relationship between speech intelligibility and speech acoustics

    Get PDF
    abstract: Speech intelligibility measures how much a speaker can be understood by a listener. Traditional measures of intelligibility, such as word accuracy, are not sufficient to reveal the reasons of intelligibility degradation. This dissertation investigates the underlying sources of intelligibility degradations from both perspectives of the speaker and the listener. Segmental phoneme errors and suprasegmental lexical boundary errors are developed to reveal the perceptual strategies of the listener. A comprehensive set of automated acoustic measures are developed to quantify variations in the acoustic signal from three perceptual aspects, including articulation, prosody, and vocal quality. The developed measures have been validated on a dysarthric speech dataset with various severity degrees. Multiple regression analysis is employed to show the developed measures could predict perceptual ratings reliably. The relationship between the acoustic measures and the listening errors is investigated to show the interaction between speech production and perception. The hypothesize is that the segmental phoneme errors are mainly caused by the imprecise articulation, while the sprasegmental lexical boundary errors are due to the unreliable phonemic information as well as the abnormal rhythm and prosody patterns. To test the hypothesis, within-speaker variations are simulated in different speaking modes. Significant changes have been detected in both the acoustic signals and the listening errors. Results of the regression analysis support the hypothesis by showing that changes in the articulation-related acoustic features are important in predicting changes in listening phoneme errors, while changes in both of the articulation- and prosody-related features are important in predicting changes in lexical boundary errors. Moreover, significant correlation has been achieved in the cross-validation experiment, which indicates that it is possible to predict intelligibility variations from acoustic signal.Dissertation/ThesisDoctoral Dissertation Speech and Hearing Science 201

    Subspace Gaussian Mixture Models for Language Identification and Dysarthric Speech Intelligibility Assessment

    Get PDF
    En esta Tesis se ha investigado la aplicación de técnicas de modelado de subespacios de mezclas de Gaussianas en dos problemas relacionados con las tecnologías del habla, como son la identificación automática de idioma (LID, por sus siglas en inglés) y la evaluación automática de inteligibilidad en el habla de personas con disartria. Una de las técnicas más importantes estudiadas es el análisis factorial conjunto (JFA, por sus siglas en inglés). JFA es, en esencia, un modelo de mezclas de Gaussianas en el que la media de cada componente se expresa como una suma de factores de dimensión reducida, y donde cada factor representa una contribución diferente a la señal de audio. Esta factorización nos permite compensar nuestros modelos frente a contribuciones indeseadas presentes en la señal, como la información de canal. JFA se ha investigado como clasficador y como extractor de parámetros. En esta última aproximación se modela un solo factor que representa todas las contribuciones presentes en la señal. Los puntos en este subespacio se denominan i-Vectors. Así, un i-Vector es un vector de baja dimensión que representa una grabación de audio. Los i-Vectors han resultado ser muy útiles como vector de características para representar señales en diferentes problemas relacionados con el aprendizaje de máquinas. En relación al problema de LID, se han investigado dos sistemas diferentes de acuerdo al tipo de información extraída de la señal. En el primero, la señal se parametriza en vectores acústicos con información espectral a corto plazo. En este caso, observamos mejoras de hasta un 50% con el sistema basado en i-Vectors respecto al sistema que utilizaba JFA como clasificador. Se comprobó que el subespacio de canal del modelo JFA también contenía información del idioma, mientras que con los i-Vectors no se descarta ningún tipo de información, y además, son útiles para mitigar diferencias entre los datos de entrenamiento y de evaluación. En la fase de clasificación, los i-Vectors de cada idioma se modelaron con una distribución Gaussiana en la que la matriz de covarianza era común para todos. Este método es simple y rápido, y no requiere de ningún post-procesado de los i-Vectors. En el segundo sistema, se introdujo el uso de información prosódica y formántica en un sistema de LID basado en i-Vectors. La precisión de éste estaba por debajo de la del sistema acústico. Sin embargo, los dos sistemas son complementarios, y se obtuvo hasta un 20% de mejora con la fusión de los dos respecto al sistema acústico solo. Tras los buenos resultados obtenidos para LID, y dado que, teóricamente, los i-Vectors capturan toda la información presente en la señal, decidimos usarlos para la evaluar de manera automática la inteligibilidad en el habla de personas con disartria. Los logopedas están muy interesados en esta tecnología porque permitiría evaluar a sus pacientes de una manera objetiva y consistente. En este caso, los i-Vectors se obtuvieron a partir de información espectral a corto plazo de la señal, y la inteligibilidad se calculó a partir de los i-Vectors obtenidos para un conjunto de palabras dichas por el locutor evaluado. Comprobamos que los resultados eran mucho mejores si en el entrenamiento del sistema se incorporaban datos de la persona que iba a ser evaluada. No obstante, esta limitación podría aliviarse utilizando una mayor cantidad de datos para entrenar el sistema.In this Thesis, we investigated how to effciently apply subspace Gaussian mixture modeling techniques onto two speech technology problems, namely automatic spoken language identification (LID) and automatic intelligibility assessment of dysarthric speech. One of the most important of such techniques in this Thesis was joint factor analysis (JFA). JFA is essentially a Gaussian mixture model where the mean of the components is expressed as a sum of low-dimension factors that represent different contributions to the speech signal. This factorization makes it possible to compensate for undesired sources of variability, like the channel. JFA was investigated as final classiffer and as feature extractor. In the latter approach, a single subspace including all sources of variability is trained, and points in this subspace are known as i-Vectors. Thus, one i-Vector is defined as a low-dimension representation of a single utterance, and they are a very powerful feature for different machine learning problems. We have investigated two different LID systems according to the type of features extracted from speech. First, we extracted acoustic features representing short-time spectral information. In this case, we observed relative improvements with i-Vectors with respect to JFA of up to 50%. We realized that the channel subspace in a JFA model also contains language information whereas i-Vectors do not discard any language information, and moreover, they help to reduce mismatches between training and testing data. For classification, we modeled the i-Vectors of each language with a Gaussian distribution with covariance matrix shared among languages. This method is simple and fast, and it worked well without any post-processing. Second, we introduced the use of prosodic and formant information with the i-Vectors system. The performance was below the acoustic system but both were found to be complementary and we obtained up to a 20% relative improvement with the fusion with respect to the acoustic system alone. Given the success in LID and the fact that i-Vectors capture all the information that is present in the data, we decided to use i-Vectors for other tasks, specifically, the assessment of speech intelligibility in speakers with different types of dysarthria. Speech therapists are very interested in this technology because it would allow them to objectively and consistently rate the intelligibility of their patients. In this case, the input features were extracted from short-term spectral information, and the intelligibility was assessed from the i-Vectors calculated from a set of words uttered by the tested speaker. We found that the performance was clearly much better if we had available data for training of the person that would use the application. We think that this limitation could be relaxed if we had larger databases for training. However, the recording process is not easy for people with disabilities, and it is difficult to obtain large datasets of dysarthric speakers open to the research community. Finally, the same system architecture for intelligibility assessment based on i-Vectors was used for predicting the accuracy that an automatic speech recognizer (ASR) system would obtain with dysarthric speakers. The only difference between both was the ground truth label set used for training. Predicting the performance response of an ASR system would increase the confidence of speech therapists in these systems and would diminish health related costs. The results were not as satisfactory as in the previous case, probably because an ASR is a complex system whose accuracy can be very difficult to be predicted only with acoustic information. Nonetheless, we think that we opened a door to an interesting research direction for the two problems

    Characteristics of speech and voice as predictors of the quality of communication in adults with dysarthria

    Get PDF
    Дизартрија представља моторни поремећај говора који захвата респирацију, фонацију, резонанцију, артикулацију и прозодију и може се манифестовати оштећењем свих или само неких компоненти процеса говорне продукције. Особе са дизартријом имају нарушену разумљивост говора као и квалитет комуникације. Основни циљеви истраживања су били да се утврде карактеристике говора и гласа одраслих особа са хипокинетичком, спастичном, флацидном и атаксичном дизартријом применом акустичке и спектралне анализе и да се утврди квалитет комуникације коју остварују ове особе применом инструмента за самопроцену степена хендикепа у комуникацији. Крајњи циљ истраживања се односио на утврђивање карактеристика говора и гласа које представљају предикторе квалитета комуникације код особа са различитим типовима дизартрије. Узорак је чинило 129 испитаника са дизартријом, од чега 33 са хипокинетичком, 36 са спастичном и по 30 са флацидном и атаксичном дизартријом. Испитаници су били оба пола, узраста од 21 до 94 године (М=66,07). У истраживању су коришћени следећи инструменти: Компјутерски програм за мултидимензионалну анализу гласа (MDVP) ради утврђивања вредности акустичких параметара који указују на варијабилност фреквенције гласа, на варијабилност интензитета гласа, на присуство тремора и шума у гласу, на постојање субхармоника и прекида у гласу, као и периода без гласа и ради утврђивања вредности спектралних параметара свих вокала и појединих консонаната; Балансирани текст на основу кога је извршена спектрална анализа гласа; Скала ,,Индекс гласовног оштећења“ (VHI) на основу које је извршена процена квалитета комуникације особа са дизартријом. Добијени резулати су указали на то да карактеристике говора и гласа одраслих особа са дизартријом значајно одступају од норми и да се разликују међу испитаницима са хипокинетичком, флацидном, спастичном и атаксичном дизартријом. Такође показало се да је код особа са дизартријом нарушен квалитет комуникације и да постоје сличности међу испитаницима са различитим типовима дизартрије у степену доживљеног хендикепа насталог услед поремећаја гласа. На основу резултата је утврђено да вредности акустичких и спектралних параметара који одређују карактеристике говора и гласа одраслих особа са дизартријом представљају предикторе квалитета комуникације и да се они разликују међу субгрупама испитаника. Такође се показало да социодемографске карактеристике испитаника, као и тип дизартрије могу представљати предикторе квалитета комуникације.Dysarthria is a motor speech disorder that involves respiration, phonation, resonance, articulation, and prosody, and can be manifested by damage of all or only some components of the speech production process. People with dysarthria have impaired speech intelligibility, and quality of communication. The main goals of the research were to determine the characteristics of speech and voice in adults with hypokinetic, spastic, flaccid and ataxic dysarthria by using an acoustic and spectral analysis, and to determine the quality of communication achieved by these individuals by applying a voice handicap self-assessment instrument. The ultimate goal of the research was to determine the characteristics of speech and voice that are predictors of the quality of communication in individuals with various types of dysarthria. The sample consisted of 129 respondents with dysarthria, 33 of them being with hypokinetic, 36 with spastic and 30 with flaccid and ataxic dysarthria each. The respondents were of both sexes aged 21-94 (М=66,07). The following instruments were used in the research: Multi-Dimensional Voice Program (MDVP) analysis to determine the values of acoustic parameters that indicate the variability of voice frequency, the variability of voice intensity, the presence of tremor and noise in the voice, the presence of subharmonics and interruptions in the voice, as well as periods without voice, and to determine the values of the spectral parameters of аll vowels and certain consonants; Balanced text based on which spectral analysis of the speech was performed; “Voice Handicap Index” (VHI) scale used to assess the quality of communication in individuals with dysarthria. The obtained results showed that characteristics of speech and voice in adults with dysarthria significantly deviate from norms and that they differ among the respondents with hypokinetic, flaccid, spastic and ataxic dysarthria. Also, they showed that the quality of communication was impaired in individuals with dysarthria and that there are similarities among respondents with different types of dysarthria in the degree of handicap experienced by different voice disorders. Based on the results, it was determined that values of acoustic and spectral parameters that determine the characteristics of speech and voice in adults with dysarthria are the predictors of the quality of communication and that they differ among subgroups of respondents. Also, it has been shown that sociodemographic characteristics of respondents and the type of dysarthria can be predictors of the quality of communication
    corecore