30 research outputs found

    Articulatory and bottleneck features for speaker-independent ASR of dysarthric speech

    Full text link
    The rapid population aging has stimulated the development of assistive devices that provide personalized medical support to the needies suffering from various etiologies. One prominent clinical application is a computer-assisted speech training system which enables personalized speech therapy to patients impaired by communicative disorders in the patient's home environment. Such a system relies on the robust automatic speech recognition (ASR) technology to be able to provide accurate articulation feedback. With the long-term aim of developing off-the-shelf ASR systems that can be incorporated in clinical context without prior speaker information, we compare the ASR performance of speaker-independent bottleneck and articulatory features on dysarthric speech used in conjunction with dedicated neural network-based acoustic models that have been shown to be robust against spectrotemporal deviations. We report ASR performance of these systems on two dysarthric speech datasets of different characteristics to quantify the achieved performance gains. Despite the remaining performance gap between the dysarthric and normal speech, significant improvements have been reported on both datasets using speaker-independent ASR architectures.Comment: to appear in Computer Speech & Language - https://doi.org/10.1016/j.csl.2019.05.002 - arXiv admin note: substantial text overlap with arXiv:1807.1094

    A computational model for studying L1’s effect on L2 speech learning

    Get PDF
    abstract: Much evidence has shown that first language (L1) plays an important role in the formation of L2 phonological system during second language (L2) learning process. This combines with the fact that different L1s have distinct phonological patterns to indicate the diverse L2 speech learning outcomes for speakers from different L1 backgrounds. This dissertation hypothesizes that phonological distances between accented speech and speakers' L1 speech are also correlated with perceived accentedness, and the correlations are negative for some phonological properties. Moreover, contrastive phonological distinctions between L1s and L2 will manifest themselves in the accented speech produced by speaker from these L1s. To test the hypotheses, this study comes up with a computational model to analyze the accented speech properties in both segmental (short-term speech measurements on short-segment or phoneme level) and suprasegmental (long-term speech measurements on word, long-segment, or sentence level) feature space. The benefit of using a computational model is that it enables quantitative analysis of L1's effect on accent in terms of different phonological properties. The core parts of this computational model are feature extraction schemes to extract pronunciation and prosody representation of accented speech based on existing techniques in speech processing field. Correlation analysis on both segmental and suprasegmental feature space is conducted to look into the relationship between acoustic measurements related to L1s and perceived accentedness across several L1s. Multiple regression analysis is employed to investigate how the L1's effect impacts the perception of foreign accent, and how accented speech produced by speakers from different L1s behaves distinctly on segmental and suprasegmental feature spaces. Results unveil the potential application of the methodology in this study to provide quantitative analysis of accented speech, and extend current studies in L2 speech learning theory to large scale. Practically, this study further shows that the computational model proposed in this study can benefit automatic accentedness evaluation system by adding features related to speakers' L1s.Dissertation/ThesisDoctoral Dissertation Speech and Hearing Science 201

    The use of deep learning solutions to develop a practice tool to support Lámh language for communication partners

    Get PDF
    This study has proposed an alternative to promote the learning and enhancement of Lámh language for communication partners that support current users by creating a real time detection tool to recognise 20 chosen Lámh signs based on existing studies in the field. This implementation was carried out by generating primary data composed by MediaPipe landmark numpy arrays of 40 frames and 45 repetitions per sign. The Neural Networks were built using the Python library Keras and the applied SVM models were built with the library sklearn. The real time detection was carried out by integrating the mentioned elements with the library OpenCV. Neural Networks with different architectures with Long Short-Term Memory (LSTM) and 1D Convolutional Neural Network (CNN) were compared with SVM classifications applied with cross-validations to achieve the optimal hyperparameters in order to determine the most appropriate model. The final chosen model after the assessment of the training and testing accuracy and loss was the two 1-D CNN layers with 32 and 64 nodes respectively, a dropout of 0.2 followed by two LSTM layers with 32 and 64 nodes respectively and a dense layer of 32 nodes. The training accuracy was 99.86%, the testing accuracy was 93.33%, the training loss was 0.0035 and the testing loss was 0.1791. This was the model which performed better in a real-time detection environment, easily detecting 8 different Lámh signs and detecting other 6 with reservations. For future work, some skeletal motion signs should be captured again and other data augmentation strategies should be adopted, like capturing hips and legs landmarks alongside the signs and explore the augmentation of the data by promoting offset measures of the landmark coordinates of the skeletons captured by MediaPipe. Once the corrections of the methodology achieve better real time results, works toward tool accessibility and user experience should be investigated in order to generate a Lámh language real-time detection tool that could potentially promote Lámh and become a learning alternative for communication partners

    Detecting early signs of dementia in conversation

    Get PDF
    Dementia can affect a person's speech, language and conversational interaction capabilities. The early diagnosis of dementia is of great clinical importance. Recent studies using the qualitative methodology of Conversation Analysis (CA) demonstrated that communication problems may be picked up during conversations between patients and neurologists and that this can be used to differentiate between patients with Neuro-degenerative Disorders (ND) and those with non-progressive Functional Memory Disorder (FMD). However, conducting manual CA is expensive and difficult to scale up for routine clinical use.\ud This study introduces an automatic approach for processing such conversations which can help in identifying the early signs of dementia and distinguishing them from the other clinical categories (FMD, Mild Cognitive Impairment (MCI), and Healthy Control (HC)). The dementia detection system starts with a speaker diarisation module to segment an input audio file (determining who talks when). Then the segmented files are passed to an automatic speech recogniser (ASR) to transcribe the utterances of each speaker. Next, the feature extraction unit extracts a number of features (CA-inspired, acoustic, lexical and word vector) from the transcripts and audio files. Finally, a classifier is trained by the features to determine the clinical category of the input conversation. Moreover, we investigate replacing the role of a neurologist in the conversation with an Intelligent Virtual Agent (IVA) (asking similar questions). We show that despite differences between the IVA-led and the neurologist-led conversations, the results achieved by the IVA are as good as those gained by the neurologists. Furthermore, the IVA can be used for administering more standard cognitive tests, like the verbal fluency tests and produce automatic scores, which then can boost the performance of the classifier. The final blind evaluation of the system shows that the classifier can identify early signs of dementia with an acceptable level of accuracy and robustness (considering both sensitivity and specificity)

    Dysarthric speech analysis and automatic recognition using phase based representations

    Get PDF
    Dysarthria is a neurological speech impairment which usually results in the loss of motor speech control due to muscular atrophy and poor coordination of articulators. Dysarthric speech is more difficult to model with machine learning algorithms, due to inconsistencies in the acoustic signal and to limited amounts of training data. This study reports a new approach for the analysis and representation of dysarthric speech, and applies it to improve ASR performance. The Zeros of Z-Transform (ZZT) are investigated for dysarthric vowel segments. It shows evidence of a phase-based acoustic phenomenon that is responsible for the way the distribution of zero patterns relate to speech intelligibility. It is investigated whether such phase-based artefacts can be systematically exploited to understand their association with intelligibility. A metric based on the phase slope deviation (PSD) is introduced that are observed in the unwrapped phase spectrum of dysarthric vowel segments. The metric compares the differences between the slopes of dysarthric vowels and typical vowels. The PSD shows a strong and nearly linear correspondence with the intelligibility of the speaker, and it is shown to hold for two separate databases of dysarthric speakers. A systematic procedure for correcting the underlying phase deviations results in a significant improvement in ASR performance for speakers with severe and moderate dysarthria. In addition, information encoded in the phase component of the Fourier transform of dysarthric speech is exploited in the group delay spectrum. Its properties are found to represent disordered speech more effectively than the magnitude spectrum. Dysarthric ASR performance was significantly improved using phase-based cepstral features in comparison to the conventional MFCCs. A combined approach utilising the benefits of PSD corrections and phase-based features was found to surpass all the previous performance on the UASPEECH database of dysarthric speech

    Combining Non-pathological Data of Different Language Varieties to Improve DNN-HMM Performance on Pathological Speech

    Get PDF
    Contains fulltext : 160601pub.pdf (publisher's version ) (Open Access)INTERSPEECH, 08 september 201

    The role of MRI in diagnosing autism: a machine learning perspective.

    Get PDF
    There is approximately 1 in every 44 children in the United States suffers from autism spectrum disorder (ASD), a disorder characterized by social and behavioral impairments. Communication difficulties, interpersonal difficulties, and behavioral difficulties are the top common symptoms. Even though symptoms can begin as early as infancy, it may take multiple visits to a pediatric specialist before an accurate diagnosis can be made. In addition, the diagnosis can be subjective, and different specialists may give different scores. There is a growing body of research suggesting differences in brain development and/or environmental and/or genetic factors contribute to autism development, but scientists have yet to identify exactly the pathology of this disorder. ASD can currently be diagnosed by a set of diagnostic evaluations, regarded as the gold standard, such as the Autism Diagnostic Observation Schedule (ADOS) or the Autism Diagnostic Interview-Revised (ADI-R). A team of qualified clinicians is needed for performing the behavioral and communication tests as well as clinical history information, hence a considerable amount of time, effort, and subjective judgment is involved in using these gold-standard diagnostic instruments. In addition to standard observational assessment, recent advancements in neuroimaging and machine learning suggest a rapid and objective alternative, using brain imaging. An investigation of the employment of different imaging modalities, namely Diffusion Tensor Imaging (DTI), and resting state functional MRI (rs-fMRI) for autism diagnosis is presented in this comprehensive work. A detailed study of the implementation of feature engineering tools to find discriminant insights from different brain imaging modalities, including the use of novel feature representations, and the use of a machine learning framework to assist in the accurate classification of autistic individuals is introduced in this dissertation. Based on three large publicly available datasets, this extensive research highlights different decisions along the pipeline and their impact on diagnostic accuracy. It also identifies potentially impacted brain regions that contribute to an autism diagnosis. Achieving high global state-of-the-art cross-validated accuracy confirms the benefits of feature representation and feature engineering in extracting useful information, as well as the potential benefits of utilizing neuroimaging in the diagnosis of autism. This should enable an early, automated, and more objective personalized diagnosis

    Combining non-pathological data of different language varieties to improve DNN-HMM performance on pathological speech

    No full text
    10.21437/Interspeech.2016-109Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH08-12-September-2016218-22

    Quantifying Quality of Life

    Get PDF
    Describes technological methods and tools for objective and quantitative assessment of QoL Appraises technology-enabled methods for incorporating QoL measurements in medicine Highlights the success factors for adoption and scaling of technology-enabled methods This open access book presents the rise of technology-enabled methods and tools for objective, quantitative assessment of Quality of Life (QoL), while following the WHOQOL model. It is an in-depth resource describing and examining state-of-the-art, minimally obtrusive, ubiquitous technologies. Highlighting the required factors for adoption and scaling of technology-enabled methods and tools for QoL assessment, it also describes how these technologies can be leveraged for behavior change, disease prevention, health management and long-term QoL enhancement in populations at large. Quantifying Quality of Life: Incorporating Daily Life into Medicine fills a gap in the field of QoL by providing assessment methods, techniques and tools. These assessments differ from the current methods that are now mostly infrequent, subjective, qualitative, memory-based, context-poor and sparse. Therefore, it is an ideal resource for physicians, physicians in training, software and hardware developers, computer scientists, data scientists, behavioural scientists, entrepreneurs, healthcare leaders and administrators who are seeking an up-to-date resource on this subject
    corecore