292 research outputs found
Model adaptation and adaptive training for the recognition of dysarthric speech
Dysarthria is a neurological speech disorder, which exhibits multi-fold disturbances in the speech production system of an individual and can have a detrimental effect on the speech output. In addition to the data sparseness problems, dysarthric speech is characterised by inconsistencies in the acoustic space making it extremely challenging to model. This paper investigates a variety of baseline speaker independent (SI) systems and its suitability for adaptation. The study also explores the usefulness of speaker adaptive training (SAT) for implicitly annihilating inter-speaker variations in a dysarthric corpus. The paper implements a hybrid MLLR-MAP based approach to adapt the SI and SAT systems. ALL the results reported uses UASPEECH dysarthric data. Our best adapted systems gave a significant absolute gain of 11.05% (20.42% relative) over the last published best result in the literature. A statistical analysis performed across various systems and its specific implementation in modelling different dysarthric severity sub-groups, showed that, SAT-adapted systems were more applicable to handle disfluencies of more severe speech and SI systems prepared from typical speech were more apt for modelling speech with low level of severity
Articulatory and bottleneck features for speaker-independent ASR of dysarthric speech
The rapid population aging has stimulated the development of assistive
devices that provide personalized medical support to the needies suffering from
various etiologies. One prominent clinical application is a computer-assisted
speech training system which enables personalized speech therapy to patients
impaired by communicative disorders in the patient's home environment. Such a
system relies on the robust automatic speech recognition (ASR) technology to be
able to provide accurate articulation feedback. With the long-term aim of
developing off-the-shelf ASR systems that can be incorporated in clinical
context without prior speaker information, we compare the ASR performance of
speaker-independent bottleneck and articulatory features on dysarthric speech
used in conjunction with dedicated neural network-based acoustic models that
have been shown to be robust against spectrotemporal deviations. We report ASR
performance of these systems on two dysarthric speech datasets of different
characteristics to quantify the achieved performance gains. Despite the
remaining performance gap between the dysarthric and normal speech, significant
improvements have been reported on both datasets using speaker-independent ASR
architectures.Comment: to appear in Computer Speech & Language -
https://doi.org/10.1016/j.csl.2019.05.002 - arXiv admin note: substantial
text overlap with arXiv:1807.1094
Adversarial Data Augmentation Using VAE-GAN for Disordered Speech Recognition
Automatic recognition of disordered speech remains a highly challenging task
to date. The underlying neuro-motor conditions, often compounded with
co-occurring physical disabilities, lead to the difficulty in collecting large
quantities of impaired speech required for ASR system development. This paper
presents novel variational auto-encoder generative adversarial network
(VAE-GAN) based personalized disordered speech augmentation approaches that
simultaneously learn to encode, generate and discriminate synthesized impaired
speech. Separate latent features are derived to learn dysarthric speech
characteristics and phoneme context representations. Self-supervised
pre-trained Wav2vec 2.0 embedding features are also incorporated. Experiments
conducted on the UASpeech corpus suggest the proposed adversarial data
augmentation approach consistently outperformed the baseline speed perturbation
and non-VAE GAN augmentation methods with trained hybrid TDNN and End-to-end
Conformer systems. After LHUC speaker adaptation, the best system using VAE-GAN
based augmentation produced an overall WER of 27.78% on the UASpeech test set
of 16 dysarthric speakers, and the lowest published WER of 57.31% on the subset
of speakers with "Very Low" intelligibility.Comment: Submitted to ICASSP 202
Dysarthric Speech Recognition and Offline Handwriting Recognition using Deep Neural Networks
Millions of people around the world are diagnosed with neurological disorders like Parkinson’s, Cerebral Palsy or Amyotrophic Lateral Sclerosis. Due to the neurological damage as the disease progresses, the person suffering from the disease loses control of muscles, along with speech deterioration. Speech deterioration is due to neuro motor condition that limits manipulation of the articulators of the vocal tract, the condition collectively called as dysarthria. Even though dysarthric speech is grammatically and syntactically correct, it is difficult for humans to understand and for Automatic Speech Recognition (ASR) systems to decipher. With the emergence of deep learning, speech recognition systems have improved a lot compared to traditional speech recognition systems, which use sophisticated preprocessing techniques to extract speech features.
In this digital era there are still many documents that are handwritten many of which need to be digitized. Offline handwriting recognition involves recognizing handwritten characters from images of handwritten text (i.e. scanned documents). This is an interesting task as it involves sequence learning with computer vision. The task is more difficult than Optical Character Recognition (OCR), because handwritten letters can be written in virtually infinite different styles. This thesis proposes exploiting deep learning techniques like Convolutional Neural Networks (CNN) and Recurrent Neural Networks (RNN) for offline handwriting recognition. For speech recognition, we compare traditional methods for speech recognition with recent deep learning methods. Also, we apply speaker adaptation methods both at feature level and at parameter level to improve recognition of dysarthric speech
Hyper-parameter Adaptation of Conformer ASR Systems for Elderly and Dysarthric Speech Recognition
Automatic recognition of disordered and elderly speech remains highly
challenging tasks to date due to data scarcity. Parameter fine-tuning is often
used to exploit the large quantities of non-aged and healthy speech pre-trained
models, while neural architecture hyper-parameters are set using expert
knowledge and remain unchanged. This paper investigates hyper-parameter
adaptation for Conformer ASR systems that are pre-trained on the Librispeech
corpus before being domain adapted to the DementiaBank elderly and UASpeech
dysarthric speech datasets. Experimental results suggest that hyper-parameter
adaptation produced word error rate (WER) reductions of 0.45% and 0.67% over
parameter-only fine-tuning on DBank and UASpeech tasks respectively. An
intuitive correlation is found between the performance improvements by
hyper-parameter domain adaptation and the relative utterance length ratio
between the source and target domain data.Comment: 5 pages, 3 figures, 3 tables, accepted by Interspeech202
- …