23 research outputs found

    Quantifying perturbations in temporal dynamics for automated assessment of spastic dysarthric speech intelligibility

    Full text link

    On combining acoustic and modulation spectrograms in an attention LSTM-based system for speech intelligibility level classification

    Full text link
    Speech intelligibility can be affected by multiple factors, such as noisy environments, channel distortions or physiological issues. In this work, we deal with the problem of automatic prediction of the speech intelligibility level in this latter case. Starting from our previous work, a non-intrusive system based on LSTM networks with attention mechanism designed for this task, we present two main contributions. In the first one, it is proposed the use of per-frame modulation spectrograms as input features, instead of compact representations derived from them that discard important temporal information. In the second one, two different strategies for the combination of per-frame acoustic log-mel and modulation spectrograms into the LSTM framework are explored: at decision level or late fusion and at utterance level or Weighted-Pooling (WP) fusion. The proposed models are evaluated with the UA-Speech database that contains dysarthric speech with different degrees of severity. On the one hand, results show that attentional LSTM networks are able to adequately modeling the modulation spectrograms sequences producing similar classification rates as in the case of log-mel spectrograms. On the other hand, both combination strategies, late and WP fusion, outperform the single-feature systems, suggesting that per-frame log-mel and modulation spectrograms carry complementary information for the task of speech intelligibility prediction, than can be effectively exploited by the LSTM-based architectures, being the system with the WP fusion strategy and Attention-Pooling the one that achieves best results

    On combining acoustic and modulation spectrograms in an attention LSTM-based system for speech intelligibility level classification

    Get PDF
    Speech intelligibility can be affected by multiple factors, such as noisy environments, channel distortions or physiological issues. In this work, we deal with the problem of automatic prediction of the speech intelligibility level in this latter case. Starting from our previous work, a non-intrusive system based on LSTM networks with attention mechanism designed for this task, we present two main contributions. In the first one, it is proposed the use of per-frame modulation spectrograms as input features, instead of compact representations derived from them that discard important temporal information. In the second one, two different strategies for the combination of per-frame acoustic log-mel and modulation spectrograms into the LSTM framework are explored: at decision level or late fusion and at utterance level or Weighted-Pooling (WP) fusion. The proposed models are evaluated with the UA-Speech database that contains dysarthric speech with different degrees of severity. On the one hand, results show that attentional LSTM networks are able to adequately modeling the modulation spectrograms sequences producing similar classification rates as in the case of log-mel spectrograms. On the other hand, both combination strategies, late and WP fusion, outperform the single-feature systems, suggesting that per-frame log-mel and modulation spectrograms carry complementary information for the task of speech intelligibility prediction, than can be effectively exploited by the LSTM-based architectures, being the system with the WP fusion strategy and Attention-Pooling the one that achieves best results.The work leading to these results has been partly supported by the Spanish Government-MinECo under Projects TEC2017-84395-P and TEC2017-84593-C2-1-R.Publicad

    Dysarthric speech analysis and automatic recognition using phase based representations

    Get PDF
    Dysarthria is a neurological speech impairment which usually results in the loss of motor speech control due to muscular atrophy and poor coordination of articulators. Dysarthric speech is more difficult to model with machine learning algorithms, due to inconsistencies in the acoustic signal and to limited amounts of training data. This study reports a new approach for the analysis and representation of dysarthric speech, and applies it to improve ASR performance. The Zeros of Z-Transform (ZZT) are investigated for dysarthric vowel segments. It shows evidence of a phase-based acoustic phenomenon that is responsible for the way the distribution of zero patterns relate to speech intelligibility. It is investigated whether such phase-based artefacts can be systematically exploited to understand their association with intelligibility. A metric based on the phase slope deviation (PSD) is introduced that are observed in the unwrapped phase spectrum of dysarthric vowel segments. The metric compares the differences between the slopes of dysarthric vowels and typical vowels. The PSD shows a strong and nearly linear correspondence with the intelligibility of the speaker, and it is shown to hold for two separate databases of dysarthric speakers. A systematic procedure for correcting the underlying phase deviations results in a significant improvement in ASR performance for speakers with severe and moderate dysarthria. In addition, information encoded in the phase component of the Fourier transform of dysarthric speech is exploited in the group delay spectrum. Its properties are found to represent disordered speech more effectively than the magnitude spectrum. Dysarthric ASR performance was significantly improved using phase-based cepstral features in comparison to the conventional MFCCs. A combined approach utilising the benefits of PSD corrections and phase-based features was found to surpass all the previous performance on the UASPEECH database of dysarthric speech

    An auditory saliency pooling-based LSTM model for speech intelligibility classification

    Get PDF
    This article belongs to the Section Computer and Engineering Science and Symmetry/Asymmetry.Speech intelligibility is a crucial element in oral communication that can be influenced by multiple elements, such as noise, channel characteristics, or speech disorders. In this paper, we address the task of speech intelligibility classification (SIC) in this last circumstance. Taking our previous works, a SIC system based on an attentional long short-term memory (LSTM) network, as a starting point, we deal with the problem of the inadequate learning of the attention weights due to training data scarcity. For overcoming this issue, the main contribution of this paper is a novel type of weighted pooling (WP) mechanism, called saliency pooling where the WP weights are not automatically learned during the training process of the network, but are obtained from an external source of information, the Kalinli’s auditory saliency model. In this way, it is intended to take advantage of the apparent symmetry between the human auditory attention mechanism and the attentional models integrated into deep learning networks. The developed systems are assessed on the UA-speech dataset that comprises speech uttered by subjects with several dysarthria levels. Results show that all the systems with saliency pooling significantly outperform a reference support vector machine (SVM)-based system and LSTM-based systems with mean pooling and attention pooling, suggesting that Kalinli’s saliency can be successfully incorporated into the LSTM architecture as an external cue for the estimation of the speech intelligibility level.The work leading to these results has been supported by the Spanish Ministry of Economy, Industry and Competitiveness through TEC2017-84395-P (MINECO) and TEC2017-84593-C2-1-R (MINECO) projects (AEI/FEDER, UE), and the Universidad Carlos III de Madrid under Strategic Action 2018/00071/001

    Medidas de inteligibilidad para predicción del grado de Parkinson

    Get PDF
    La comunicación ha sido un instinto básico en el desarrollo del hombre, las personas tendemos a interactuar con el medio, y, por tanto, con nuestros iguales, es por ello, que es imprescindible lograr un proceso comunicativo donde prime el entendimiento. Unos de los factores para conseguir un correcto entendimiento entre interlocutores a través de la comunicación oral, es la inteligibilidad del habla, que en ocasiones puede verse afectada a causa de la denominada disartria. A lo largo de esta memoria, se hablará de dicha disartria y de las implicaciones que tiene en personas con enfermedad de Parkinson. Es la segunda enfermedad más extendida después del Alzheimer, y por tanto, afecta a más de 300.000 personas tan solo en España. Cifra que irá aumentando debido al envejecimiento de la población. Con este Trabajo Fin de Grado, se pretende elaborar un predictor que sea capaz de estimar el grado de inteligibilidad de señales de voz. Se ha utilizado la base de datos “Universal Access” que contiene audios de diversos interlocutores con disartria y sus correspondientes etiquetas con el grado de inteligibilidad que se obtuvieron de forma subjetiva por una serie de evaluadores. La disartria se presenta como síntoma habitual en personas con Parkinson, por ello se ha elegido esta base de datos para el desarrollo y evaluación del sistema. El sistema predictor de inteligibilidad que se ha desarrollado consta de una serie de procesos como la extracción de las características acústicas o features, selección de características, regresión y evaluación de los resultados, entre otros. Tras insertar las señales por el predictor, se obtiene una salida concreta con la predicción del grado de inteligibilidad del paciente, que se evalúa en base a la correlación de Pearson y la raíz del error cuadrático medio. Se han realizado diferentes tipos de pruebas, comparadas con artículos relacionados o de forma independiente. En todas ellas, los resultados han presentado un alto grado de aproximación, alcanzando los objetivos planteados en el proyecto.Communication has been a basic instinct in the development of human, people tend to interact with the environment, and therefore with our peers, that is why it is essential to achieve a communicative process where the understanding prevails. One of the factors to achieve a correct understanding between interlocutors through oral communication is speech intelligibility, which can sometimes be affected by the so-called dysarthria. Throughout this report, we will discuss such dysarthria and the implications it has on people with Parkinson's disease. It is the second most widespread disease after Alzheimer's disease, and therefore affects more than 300,000 people just in Spain. This figure will increase due to the aging of the population. With this Final Degree Project, we pretend to elaborate a predictor that is capable of estimating the degree of intelligibility of speech signals. We have used the “Universal Access” database that contains audios of several speakers with dysarthria and their corresponding labels with the intelligibility score that were subjectively obtained by a set of evaluators. Dysarthria presents as a common symptom in people with Parkinson's disease, so this database has been chosen for the development and assessment of the system. The intelligibility prediction system that has been developed consists of several processes as the extraction of acoustic characteristics or features, feature selection, regression and results evaluation, among others. After feeding the signals into the predictor, we obtain an output with the prediction of the intelligibility degree of the patient, which is evaluated according to the Pearson correlation and the root mean square error. Different types of tests have been performed, compared to related papers or independently. In all of them, the results have presented a high degree of approximation, achieving the objectives of the project.Ingeniería de Sistemas de Comunicacione

    Models and Analysis of Vocal Emissions for Biomedical Applications

    Get PDF
    The MAVEBA Workshop proceedings, held on a biannual basis, collect the scientific papers presented both as oral and poster contributions, during the conference. The main subjects are: development of theoretical and mechanical models as an aid to the study of main phonatory dysfunctions, as well as the biomedical engineering methods for the analysis of voice signals and images, as a support to clinical diagnosis and classification of vocal pathologies

    A survey on perceived speaker traits: personality, likability, pathology, and the first challenge

    Get PDF
    The INTERSPEECH 2012 Speaker Trait Challenge aimed at a unified test-bed for perceived speaker traits – the first challenge of this kind: personality in the five OCEAN personality dimensions, likability of speakers, and intelligibility of pathologic speakers. In the present article, we give a brief overview of the state-of-the-art in these three fields of research and describe the three sub-challenges in terms of the challenge conditions, the baseline results provided by the organisers, and a new openSMILE feature set, which has been used for computing the baselines and which has been provided to the participants. Furthermore, we summarise the approaches and the results presented by the participants to show the various techniques that are currently applied to solve these classification tasks

    Models and analysis of vocal emissions for biomedical applications: 5th International Workshop: December 13-15, 2007, Firenze, Italy

    Get PDF
    The MAVEBA Workshop proceedings, held on a biannual basis, collect the scientific papers presented both as oral and poster contributions, during the conference. The main subjects are: development of theoretical and mechanical models as an aid to the study of main phonatory dysfunctions, as well as the biomedical engineering methods for the analysis of voice signals and images, as a support to clinical diagnosis and classification of vocal pathologies. The Workshop has the sponsorship of: Ente Cassa Risparmio di Firenze, COST Action 2103, Biomedical Signal Processing and Control Journal (Elsevier Eds.), IEEE Biomedical Engineering Soc. Special Issues of International Journals have been, and will be, published, collecting selected papers from the conference

    Effects of deep brain stimulation on speech in patients with Parkinson’s disease and dystonia

    Get PDF
    Disorders affecting the basal ganglia can have a severe effect on speech motor control. The effect can vary depending on the pathophysiology of the basal ganglia disease but in general terms it can be classified as hypokinetic or hyperkinetic dysarthria. Despite the role of basal ganglia on speech, there is a marked discrepancy between the effect of medical and surgical treatments on limb and speech motor control. This is compounded by the complex nature of speech and communication in general, and the lack of animal models of speech motor control. The emergence of deep brain stimulation of basal ganglia structures gives us the opportunity to record systematically the effects on speech and attempt some assumptions on the role of basal ganglia on speech motor control. The aim of the present work was to examine the impact of bilateral subthalamic nucleus deep brain stimulation (STN-DBS) for Parkinson’s disease (PD) and globus pallidus internus (GPi-DBS) for dystonia on speech motor control. A consecutive series of PD and dystonia patients who underwent DBS was evaluated. Patients were studied in a prospective longitudinal manner with both clinical assessment of their speech intelligibility and acoustical analysis of their speech. The role of pre-operative clinical factors and electrical parameters of stimulation, mainly electrode positioning and voltage amplitude was systematically examined. In addition, for selected patients, tongue movements were studied using electropalatography. Aerodynamic aspects of speech were also studied. The impact of speech therapy was assessed in a subgroup of patients. The clinical evaluation of speech intelligibility one and three years post STN-DBS in PD patients showed a deterioration of speech, partly related to medially placed electrodes and high amplitude of stimulation. Pre-operative predictive factors included low speech intelligibility before surgery and longer disease duration. Articulation rather than voice was most frequently affected with a distinct dysarthria type emerging, mainly hyperkinetic-dystonic, rather than hypokinetic. Traditionally effective therapy for PD dysarthria had little to no benefit following STN-DBS. Speech following GPi-DBS for dystonia did not significantly change after one year of stimulation. A subgroup of patients showed hypokinetic features, mainly reduced voice volume and fast rate of speech more typical of Parkinsonian speech. Speech changes in both STN-DBS and GPi-DBS were apparent after six months of stimulation. This progressive deterioration of speech and the critical role of the electrical parameters of stimulation suggest a long-term effect of electrical stimulation of basal ganglia on speech motor control
    corecore