69 research outputs found

    Adversarial Data Augmentation Using VAE-GAN for Disordered Speech Recognition

    Full text link
    Automatic recognition of disordered speech remains a highly challenging task to date. The underlying neuro-motor conditions, often compounded with co-occurring physical disabilities, lead to the difficulty in collecting large quantities of impaired speech required for ASR system development. This paper presents novel variational auto-encoder generative adversarial network (VAE-GAN) based personalized disordered speech augmentation approaches that simultaneously learn to encode, generate and discriminate synthesized impaired speech. Separate latent features are derived to learn dysarthric speech characteristics and phoneme context representations. Self-supervised pre-trained Wav2vec 2.0 embedding features are also incorporated. Experiments conducted on the UASpeech corpus suggest the proposed adversarial data augmentation approach consistently outperformed the baseline speed perturbation and non-VAE GAN augmentation methods with trained hybrid TDNN and End-to-end Conformer systems. After LHUC speaker adaptation, the best system using VAE-GAN based augmentation produced an overall WER of 27.78% on the UASpeech test set of 16 dysarthric speakers, and the lowest published WER of 57.31% on the subset of speakers with "Very Low" intelligibility.Comment: Submitted to ICASSP 202

    Functional specialization and interaction in the amygdala-hippocampus circuit during working memory processing

    Full text link
    Both the hippocampus and amygdala are involved in working memory (WM) processing. However, their specific role in WM is still an open question. Here, we simultaneously recorded intracranial EEG from the amygdala and hippocampus of epilepsy patients while performing a WM task, and compared their representation patterns during the encoding and maintenance periods. By combining multivariate representational analysis and connectivity analyses with machine learning methods, our results revealed a functional specialization of the amygdala-hippocampal circuit: The mnemonic representations in the amygdala were highly distinct and decreased from encoding to maintenance. The hippocampal representations, however, were more similar across different items but remained stable in the absence of the stimulus. WM encoding and maintenance were associated with bidirectional information flow between the amygdala and the hippocampus in low-frequency bands (1-40 Hz). Furthermore, the decoding accuracy on WM load was higher by using representational features in the amygdala during encoding and in the hippocampus during maintenance, and by using information flow from the amygdala during encoding and that from the hippocampus during maintenance, respectively. Taken together, our study reveals that WM processing is associated with functional specialization and interaction within the amygdala-hippocampus circuit

    Hyper-parameter Adaptation of Conformer ASR Systems for Elderly and Dysarthric Speech Recognition

    Full text link
    Automatic recognition of disordered and elderly speech remains highly challenging tasks to date due to data scarcity. Parameter fine-tuning is often used to exploit the large quantities of non-aged and healthy speech pre-trained models, while neural architecture hyper-parameters are set using expert knowledge and remain unchanged. This paper investigates hyper-parameter adaptation for Conformer ASR systems that are pre-trained on the Librispeech corpus before being domain adapted to the DementiaBank elderly and UASpeech dysarthric speech datasets. Experimental results suggest that hyper-parameter adaptation produced word error rate (WER) reductions of 0.45% and 0.67% over parameter-only fine-tuning on DBank and UASpeech tasks respectively. An intuitive correlation is found between the performance improvements by hyper-parameter domain adaptation and the relative utterance length ratio between the source and target domain data.Comment: 5 pages, 3 figures, 3 tables, accepted by Interspeech202

    Confidence Score Based Speaker Adaptation of Conformer Speech Recognition Systems

    Full text link
    Speaker adaptation techniques provide a powerful solution to customise automatic speech recognition (ASR) systems for individual users. Practical application of unsupervised model-based speaker adaptation techniques to data intensive end-to-end ASR systems is hindered by the scarcity of speaker-level data and performance sensitivity to transcription errors. To address these issues, a set of compact and data efficient speaker-dependent (SD) parameter representations are used to facilitate both speaker adaptive training and test-time unsupervised speaker adaptation of state-of-the-art Conformer ASR systems. The sensitivity to supervision quality is reduced using a confidence score-based selection of the less erroneous subset of speaker-level adaptation data. Two lightweight confidence score estimation modules are proposed to produce more reliable confidence scores. The data sparsity issue, which is exacerbated by data selection, is addressed by modelling the SD parameter uncertainty using Bayesian learning. Experiments on the benchmark 300-hour Switchboard and the 233-hour AMI datasets suggest that the proposed confidence score-based adaptation schemes consistently outperformed the baseline speaker-independent (SI) Conformer model and conventional non-Bayesian, point estimate-based adaptation using no speaker data selection. Similar consistent performance improvements were retained after external Transformer and LSTM language model rescoring. In particular, on the 300-hour Switchboard corpus, statistically significant WER reductions of 1.0%, 1.3%, and 1.4% absolute (9.5%, 10.9%, and 11.3% relative) were obtained over the baseline SI Conformer on the NIST Hub5'00, RT02, and RT03 evaluation sets respectively. Similar WER reductions of 2.7% and 3.3% absolute (8.9% and 10.2% relative) were also obtained on the AMI development and evaluation sets.Comment: IEEE/ACM Transactions on Audio, Speech, and Language Processin

    Audio-visual End-to-end Multi-channel Speech Separation, Dereverberation and Recognition

    Full text link
    Accurate recognition of cocktail party speech containing overlapping speakers, noise and reverberation remains a highly challenging task to date. Motivated by the invariance of visual modality to acoustic signal corruption, an audio-visual multi-channel speech separation, dereverberation and recognition approach featuring a full incorporation of visual information into all system components is proposed in this paper. The efficacy of the video input is consistently demonstrated in mask-based MVDR speech separation, DNN-WPE or spectral mapping (SpecM) based speech dereverberation front-end and Conformer ASR back-end. Audio-visual integrated front-end architectures performing speech separation and dereverberation in a pipelined or joint fashion via mask-based WPD are investigated. The error cost mismatch between the speech enhancement front-end and ASR back-end components is minimized by end-to-end jointly fine-tuning using either the ASR cost function alone, or its interpolation with the speech enhancement loss. Experiments were conducted on the mixture overlapped and reverberant speech data constructed using simulation or replay of the Oxford LRS2 dataset. The proposed audio-visual multi-channel speech separation, dereverberation and recognition systems consistently outperformed the comparable audio-only baseline by 9.1% and 6.2% absolute (41.7% and 36.0% relative) word error rate (WER) reductions. Consistent speech enhancement improvements were also obtained on PESQ, STOI and SRMR scores.Comment: IEEE/ACM Transactions on Audio, Speech, and Language Processin

    The cortical surface area of the insula mediates the effect of DBH rs7040170 on novelty seeking

    Get PDF
    Novelty seeking (NS) is a personality trait important for adaptive functioning, but an excessive level of NS has been linked to psychiatric disorders such as ADHD and substance abuse. Previous research has investigated separately the neural and genetic bases of the NS trait, but results were mixed and neural and genetic bases have yet to be examined within the same study. In this study, we examined the interrelationships among the dopamine beta-hydroxylase (DBH) gene, brain structure, and the NS trait in 359 healthy Han Chinese subjects. We focused on the DBH gene because it encodes a key enzyme for dopamine metabolism, NS is believed to be related to the dopaminergic system and has been reported associated with DBH variation. Results showed a significant positive association between the cortical surface area of the left insula and NS score. Furthermore, the DBH genetic polymorphism at the SNP rs7040170 was strongly associated with both the surface area of the left insula and NS score, with G carriers having a larger left insula surface area and a higher NS score than AA homozygotes. Subsequent path analysis suggested that the insula partially mediated the association between the DBH gene and the NS trait. Our data provided the first evidence for the involvement of the insula in the dopamine-NS relationship. Future studies of molecular mechanisms underlying the NS personality trait and related psychiatric disorders should consider the mediation effect of the neural structure

    Polygenic risk for Alzheimer's disease influences precuneal volume in two independent general populations

    Get PDF
    Alzheimer's disease (AD) is heritable with complex genetic underpinnings. Based on previous results from large-scale genome-wide association studies, recent studies found an association between the polygenic risk score (PGRS) of AD and the structure of some preselected brain regions, but the effects of AD PGRS on all voxels of the brain have not been fully investigated. In the present study, we examined the voxel-wise effect of AD PGRS on the entire brain and the influence of AD PGRS on cognitive function in 2 independent healthy young cohorts. In both cohorts, an elevated AD PGRS was associated with a smaller precuneal volume, and the effect remained after excluding the APOE genotype. No correlation was found between AD PGRS and any cognitive measure in either sample. Finding a negative correlation between the AD PGRS and the precuneal volume could help to elucidate the mechanism of the genetic risk for AD and could provide a potential biomarker for early detection and possible interventions in AD

    Stable Modality-Specific Activity Flows As Reflected by the Neuroenergetic Approach to the fMRI Weighted Maps

    Get PDF
    This article uses the ideas of neuroenergetic and neural field theories to detect stimulation-driven energy flows in the brain during face and auditory word processing. In this analysis, energy flows are thought to create the stable gradients of the fMRI weighted summary images. The sources, from which activity spreads in the brain during face processing, were detected in the occipital cortex. The following direction of energy flows in the frontal cortex was described: the right inferior frontal = >the left inferior frontal = >the triangular part of the left inferior frontal cortex = >the left operculum. In the left operculum, a localized circuit was described. For auditory word processing, the sources of activity flows were detected bilaterally in the middle superior temporal regions, they were also detected in the left posterior superior temporal cortex. Thus, neuroenergetic assumptions may give a novel perspective for the analysis of neuroimaging data

    Case report: Reversible splenial lesion syndrome caused by diquat poisoning

    Get PDF
    Diquat (DQ), chemically known as 1,1 ‘-ethylene-2,2’ -bipyridine, is a non-selective herbicide for leaf removal and drying. It has toxic effects on central nervous system cells, and toxic neurological lesions include axonal degeneration and pontine myelolysis. At the same time, DQ can also affect the activity of dopaminergic nerve cells through oxidative stress, causing degeneration and reducing dopamine uptake. With the increasing application of DQ in agricultural production, the clinical reports of neurotoxicity caused by acute DQ poisoning are also increasing. At present, DQ rapid-phase-related toxic encephalopathy mainly involves the pons, midbrain, basal ganglia, thalamus and other brain regions. However, this case is unusual in that the lesion mainly involved the splenium of the corpus callosum. It is also the first time to be reported

    Resting-state coupling between core regions within the central-executive and salience networks contributes to working memory performance

    No full text
    Previous studies investigated the distinct roles played by different cognitive regions and suggested that the patterns of connectivity of these regions are associated with working memory. However, the specific causal mechanism through which the neuronal circuits that involve these brain regions contribute to working memory is still unclear. Here, in a large sample of healthy young adults, we first identified the core working memory regions by linking working memory accuracy to resting-state functional connectivity with the bilateral dorsolateral prefrontal cortex (a principal region in the central-executive network). Then a spectral dynamic causal modeling analysis was performed to quantify the effective connectivity between these regions. Finally, the effective connectivity was correlated with working memory accuracy to characterize the relationship between these connections and working memory performance. We found that the functional connections between the bilateral dorsolateral prefrontal cortex and the dorsal anterior cingulate cortex and between the right dorsolateral prefrontal cortex and the left orbital fronto-insular cortex were correlated with working memory accuracy. Furthermore, the effective connectivity from the dorsal anterior cingulate cortex to the bilateral dorsolateral prefrontal cortex and from the right dorsolateral prefrontal cortex to the left orbital fronto-insular cortex could predict individual differences in working memory. Because the dorsal anterior cingulate cortex and orbital fronto-insular cortex are core regions of the salience network, we inferred that the inter- and causal-connectivity between core regions within the central-executive and salience networks is functionally relevant for working memory performance. In summary, the current study identified the dorsolateral prefrontal cortex-related resting-state effective connectivity underlying working memory and suggests that individual differences in cognitive ability could be characterized by resting-state effective connectivity
    • …
    corecore