4 research outputs found

    Modeling Sub-Band Information Through Discrete Wavelet Transform to Improve Intelligibility Assessment of Dysarthric Speech

    Get PDF
    The speech signal within a sub-band varies at a fine level depending on the type, and level of dysarthria. The Mel-frequency filterbank used in the computation process of cepstral coefficients smoothed out this fine level information in the higher frequency regions due to the larger bandwidth of filters. To capture the sub-band information, in this paper, four-level discrete wavelet transform (DWT) decomposition is firstly performed to decompose the input speech signal into approximation and detail coefficients, respectively, at each level. For a particular input speech signal, five speech signals representing different sub-bands are then reconstructed using inverse DWT (IDWT). The log filterbank energies are computed by analyzing the short-term discrete Fourier transform magnitude spectra of each reconstructed speech using a 30-channel Mel-filterbank. For each analysis frame, the log filterbank energies obtained across all reconstructed speech signals are pooled together, and discrete cosine transform is performed to represent the cepstral feature, here termed as discrete wavelet transform reconstructed (DWTR)- Mel frequency cepstral coefficient (MFCC). The i-vector based dysarthric level assessment system developed on the universal access speech corpus shows that the proposed DTWRMFCC feature outperforms the conventional MFCC and several other cepstral features reported for a similar task. The usages of DWTR- MFCC improve the detection accuracy rate (DAR) of the dysarthric level assessment system in the text and the speaker-independent test case to 60.094 % from 56.646 % MFCC baseline. Further analysis of the confusion matrices shows that confusion among different dysarthric classes is quite different for MFCC and DWTR-MFCC features. Motivated by this observation, a two-stage classification approach employing discriminating power of both kinds of features is proposed to improve the overall performance of the developed dysarthric level assessment system. The two-stage classification scheme further improves the DAR to 65.813 % in the text and speaker- independent test case

    Using Synchronized Audio Mapping to Predict Velar and Pharyngeal Wall Locations during Dynamic MRI Sequences

    Get PDF
    Automatic tongue, velum (i.e., soft palate), and pharyngeal movement tracking systems provide a significant benefit for the analysis of dynamic speech movements. Studies have been conducted using ultrasound, x-ray, and Magnetic Resonance Images (MRI) to examine the dynamic nature of the articulators during speech. Simulating the movement of the tongue, velum, and pharynx is often limited by image segmentation obstacles, where, movements of the velar structures are segmented through manual tracking. These methods are extremely time-consuming, coupled with inherent noise, motion artifacts, air interfaces, and refractions often complicate the process of computer-based automatic tracking. Furthermore, image segmentation and processing techniques of velopharyngeal structures often suffer from leakage issues related to the poor image quality of the MRI and the lack of recognizable boundaries between the velum and pharynx during contact moments. Computer-based tracking algorithms are developed to overcome these disadvantages by utilizing machine learning techniques and corresponding speech signals that may be considered prior information. The purpose of this study is to illustrate a methodology to track the velum and pharynx from a MRI sequence using the Hidden Markov Model (HMM) and Mel-Frequency Cepstral Coefficients (MFCC) by analyzing the corresponding audio signals. Auditory models such as MFCC have been widely used in Automatic Speech Recognition (ASR) systems. Our method uses customized version of the traditional approach for audio feature extraction in order to extract visual feature from the outer boundaries of the velum and the pharynx marked (selected pixel) by a novel method, The reduced audio features helps to shrink the search space of HMM and improve the system performance.   Three hundred consecutive images were tagged by the researcher. Two hundred of these images and the corresponding audio features (5 seconds) were used to train the HMM and a 2.5 second long audio file was used to test the model. The error rate was measured by calculating minimum distance between predicted and actual markers. Our model was able to track and animate dynamic articulators during the speech process in real-time with an overall accuracy of 81% considering one pixel threshold. The predicted markers (pixels) indicated the segmented structures, even though the contours of contacted areas were fuzzy and unrecognizable.  M.S

    Towards Automatic Speech-Language Assessment for Aphasia Rehabilitation

    Full text link
    Speech-based technology has the potential to reinforce traditional aphasia therapy through the development of automatic speech-language assessment systems. Such systems can provide clinicians with supplementary information to assist with progress monitoring and treatment planning, and can provide support for on-demand auxiliary treatment. However, current technology cannot support this type of application due to the difficulties associated with aphasic speech processing. The focus of this dissertation is on the development of computational methods that can accurately assess aphasic speech across a range of clinically-relevant dimensions. The first part of the dissertation focuses on novel techniques for assessing aphasic speech intelligibility in constrained contexts. The second part investigates acoustic modeling methods that lead to significant improvement in aphasic speech recognition and allow the system to work with unconstrained speech samples. The final part demonstrates the efficacy of speech recognition-based analysis in automatic paraphasia detection, extraction of clinically-motivated quantitative measures, and estimation of aphasia severity. The methods and results presented in this work will enable robust technologies for accurately recognizing and assessing aphasic speech, and will provide insights into the link between computational methods and clinical understanding of aphasia.PHDComputer Science & EngineeringUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttps://deepblue.lib.umich.edu/bitstream/2027.42/140840/1/ducle_1.pd