10 research outputs found

    Adversarial Data Augmentation Using VAE-GAN for Disordered Speech Recognition

    Full text link
    Automatic recognition of disordered speech remains a highly challenging task to date. The underlying neuro-motor conditions, often compounded with co-occurring physical disabilities, lead to the difficulty in collecting large quantities of impaired speech required for ASR system development. This paper presents novel variational auto-encoder generative adversarial network (VAE-GAN) based personalized disordered speech augmentation approaches that simultaneously learn to encode, generate and discriminate synthesized impaired speech. Separate latent features are derived to learn dysarthric speech characteristics and phoneme context representations. Self-supervised pre-trained Wav2vec 2.0 embedding features are also incorporated. Experiments conducted on the UASpeech corpus suggest the proposed adversarial data augmentation approach consistently outperformed the baseline speed perturbation and non-VAE GAN augmentation methods with trained hybrid TDNN and End-to-end Conformer systems. After LHUC speaker adaptation, the best system using VAE-GAN based augmentation produced an overall WER of 27.78% on the UASpeech test set of 16 dysarthric speakers, and the lowest published WER of 57.31% on the subset of speakers with "Very Low" intelligibility.Comment: Submitted to ICASSP 202

    Hyper-parameter Adaptation of Conformer ASR Systems for Elderly and Dysarthric Speech Recognition

    Full text link
    Automatic recognition of disordered and elderly speech remains highly challenging tasks to date due to data scarcity. Parameter fine-tuning is often used to exploit the large quantities of non-aged and healthy speech pre-trained models, while neural architecture hyper-parameters are set using expert knowledge and remain unchanged. This paper investigates hyper-parameter adaptation for Conformer ASR systems that are pre-trained on the Librispeech corpus before being domain adapted to the DementiaBank elderly and UASpeech dysarthric speech datasets. Experimental results suggest that hyper-parameter adaptation produced word error rate (WER) reductions of 0.45% and 0.67% over parameter-only fine-tuning on DBank and UASpeech tasks respectively. An intuitive correlation is found between the performance improvements by hyper-parameter domain adaptation and the relative utterance length ratio between the source and target domain data.Comment: 5 pages, 3 figures, 3 tables, accepted by Interspeech202

    Confidence Score Based Speaker Adaptation of Conformer Speech Recognition Systems

    Full text link
    Speaker adaptation techniques provide a powerful solution to customise automatic speech recognition (ASR) systems for individual users. Practical application of unsupervised model-based speaker adaptation techniques to data intensive end-to-end ASR systems is hindered by the scarcity of speaker-level data and performance sensitivity to transcription errors. To address these issues, a set of compact and data efficient speaker-dependent (SD) parameter representations are used to facilitate both speaker adaptive training and test-time unsupervised speaker adaptation of state-of-the-art Conformer ASR systems. The sensitivity to supervision quality is reduced using a confidence score-based selection of the less erroneous subset of speaker-level adaptation data. Two lightweight confidence score estimation modules are proposed to produce more reliable confidence scores. The data sparsity issue, which is exacerbated by data selection, is addressed by modelling the SD parameter uncertainty using Bayesian learning. Experiments on the benchmark 300-hour Switchboard and the 233-hour AMI datasets suggest that the proposed confidence score-based adaptation schemes consistently outperformed the baseline speaker-independent (SI) Conformer model and conventional non-Bayesian, point estimate-based adaptation using no speaker data selection. Similar consistent performance improvements were retained after external Transformer and LSTM language model rescoring. In particular, on the 300-hour Switchboard corpus, statistically significant WER reductions of 1.0%, 1.3%, and 1.4% absolute (9.5%, 10.9%, and 11.3% relative) were obtained over the baseline SI Conformer on the NIST Hub5'00, RT02, and RT03 evaluation sets respectively. Similar WER reductions of 2.7% and 3.3% absolute (8.9% and 10.2% relative) were also obtained on the AMI development and evaluation sets.Comment: IEEE/ACM Transactions on Audio, Speech, and Language Processin

    Audio-visual End-to-end Multi-channel Speech Separation, Dereverberation and Recognition

    Full text link
    Accurate recognition of cocktail party speech containing overlapping speakers, noise and reverberation remains a highly challenging task to date. Motivated by the invariance of visual modality to acoustic signal corruption, an audio-visual multi-channel speech separation, dereverberation and recognition approach featuring a full incorporation of visual information into all system components is proposed in this paper. The efficacy of the video input is consistently demonstrated in mask-based MVDR speech separation, DNN-WPE or spectral mapping (SpecM) based speech dereverberation front-end and Conformer ASR back-end. Audio-visual integrated front-end architectures performing speech separation and dereverberation in a pipelined or joint fashion via mask-based WPD are investigated. The error cost mismatch between the speech enhancement front-end and ASR back-end components is minimized by end-to-end jointly fine-tuning using either the ASR cost function alone, or its interpolation with the speech enhancement loss. Experiments were conducted on the mixture overlapped and reverberant speech data constructed using simulation or replay of the Oxford LRS2 dataset. The proposed audio-visual multi-channel speech separation, dereverberation and recognition systems consistently outperformed the comparable audio-only baseline by 9.1% and 6.2% absolute (41.7% and 36.0% relative) word error rate (WER) reductions. Consistent speech enhancement improvements were also obtained on PESQ, STOI and SRMR scores.Comment: IEEE/ACM Transactions on Audio, Speech, and Language Processin

    Exploring Self-supervised Pre-trained ASR Models For Dysarthric and Elderly Speech Recognition

    Full text link
    Automatic recognition of disordered and elderly speech remains a highly challenging task to date due to the difficulty in collecting such data in large quantities. This paper explores a series of approaches to integrate domain adapted SSL pre-trained models into TDNN and Conformer ASR systems for dysarthric and elderly speech recognition: a) input feature fusion between standard acoustic frontends and domain adapted wav2vec2.0 speech representations; b) frame-level joint decoding of TDNN systems separately trained using standard acoustic features alone and with additional wav2vec2.0 features; and c) multi-pass decoding involving the TDNN/Conformer system outputs to be rescored using domain adapted wav2vec2.0 models. In addition, domain adapted wav2vec2.0 representations are utilized in acoustic-to-articulatory (A2A) inversion to construct multi-modal dysarthric and elderly speech recognition systems. Experiments conducted on the UASpeech dysarthric and DementiaBank Pitt elderly speech corpora suggest TDNN and Conformer ASR systems integrated domain adapted wav2vec2.0 models consistently outperform the standalone wav2vec2.0 models by statistically significant WER reductions of 8.22% and 3.43% absolute (26.71% and 15.88% relative) on the two tasks respectively. The lowest published WERs of 22.56% (52.53% on very low intelligibility, 39.09% on unseen words) and 18.17% are obtained on the UASpeech test set of 16 dysarthric speakers, and the DementiaBank Pitt test set respectively.Comment: accepted by ICASSP 202

    On the Use of Hybrid CFIE-EFIE for Objects Containing Closed-Open Surface Junctions

    Get PDF
    To effectively solve the electromagnetic scattering or radiation properties from the perfect electric conductor (PEC) objects containing closed-open surface junctions, how to establish the hybrid combined field integral equation-electric field integral equation (CFIE-EFIE) is studied, which is different with the existing scheme for the objects where the closed and open parts are separate. Further, it is found that when the integral equation is solved using the method of moments (MoM), if the widely used RWG basis functions are employed to expand the induced surface current, the CFIE-EFIE may give inaccurate numerical results for the objects containing fine structures. The numerical accuracy can be improved by introducing the linear-linear (LL) basis functions. Moreover, to pursue a high computational efficiency, the LL and RWG basis functions are simultaneously used to expand the current on the fine structures and other relatively smooth surfaces respectively, whose validity is verified by numerical results

    A well‐conditioned integral equation for electromagnetic scattering from composite inhomogeneous bi‐anisotropic material and closed perfect electric conductor objects

    No full text
    A well‐conditioned volume‐surface integral equation, called the volume integral equation‐ combined field integral equation, is applied to analyse electromagnetic (EM) scattering from arbitrarily shaped three‐dimensional composite objects comprising both inhomogeneous bi‐anisotropic material and closed perfect electric conductors (PECs). The equivalent surface and volume currents are respectively expanded using the commonly used RWG and SWG basis functions, while a matrix equation is derived by the method of moments. Because the magnetic field integral equation is involved in modelling the surface electric current, and the constitutive parameters are all tensors, some new kinds of singularities are encountered and properly handled in the filling process of the impedance matrix. Several numerical results of EM scattering from composite bi‐anisotropy and closed PEC objects are shown to illustrate the accuracy and efficiency of the proposed scheme. The validity of the continuity condition of electric flux enforced on the bi‐anisotropy‐PEC interfaces, which can be used to eliminate the volumetric electric unknowns, is also verified.This is the published version of the following article: Liu, Jinbo, Jin Yuan, Zengrui Li, and Jiming Song. "A well‐conditioned integral equation for electromagnetic scattering from composite inhomogeneous bi‐anisotropic material and closed perfect electric conductor objects." IET Microwaves, Antennas & Propagation (2021). DOI: 10.1049/mia2.12051. Posted with permission.</p

    Multiservice-Based Traffic Scheduling for 5G Access Traffic Steering, Switching and Splitting

    No full text
    As a key enabler of the access traffic steering, switching and splitting (ATSSS) feature, multipath transport can leverage the simultaneous use of several network paths and support seamless failover to improve both communication throughput and resilience. Therefore, a traffic scheduling strategy is necessary to determine the best network path combination that may improve the performance of multipath transport. To address this need, we developed a multiservice-type based transmission (MSTT) traffic scheduling optimization strategy, which involves three steps. First, the user equipment (UE) selects the number of data stream transmission paths, considering the service utility function, and either transmits all data streams via the 3GPP network or sends two streams, one via the 3GPP network and the other via the non-3GPP network. Second, the proposed method is used to select the transmission path for each data stream based on load balancing. Finally, an algorithm for optimizing traffic scheduling is formulated by applying the convex optimization problem to maximize the effective network capacity under a Delay Quality of Service (DQoS) constraint. The proposed traffic scheduling strategy is validated through simulation experiments. The results indicate that user satisfaction and effective capacity realized are always better than when using the always-best-connected and fixed-ratio power-allocation algorithms

    On the Use of Hybrid CFIE-EFIE for Objects Containing Closed-Open Surface Junctions

    Get PDF
    To effectively solve the electromagnetic scattering or radiation properties from the perfect electric conductor (PEC) objects containing closed-open surface junctions, how to establish the hybrid combined field integral equation-electric field integral equation (CFIE-EFIE) is studied, which is different with the existing scheme for the objects where the closed and open parts are separate. Further, it is found that when the integral equation is solved using the method of moments (MoM), if the widely used RWG basis functions are employed to expand the induced surface current, the CFIE-EFIE may give inaccurate numerical results for the objects containing fine structures. The numerical accuracy can be improved by introducing the linear-linear (LL) basis functions. Moreover, to pursue a high computational efficiency, the LL and RWG basis functions are simultaneously used to expand the current on the fine structures and other relatively smooth surfaces respectively, whose validity is verified by numerical results.This is a manuscript of an article published as Liu, Jinbo, Jin Yuan, Wen Luo, Zengrui Li, and Jiming Song. "On the Use of Hybrid CFIE-EFIE for Objects Containing Closed-Open Surface Junctions." IEEE Antennas and Wireless Propagation Letters (2021). DOI: 10.1109/LAWP.2021.3077143. Posted with permission.</p

    Drug Repurposing of Histone Deacetylase Inhibitors That Alleviate Neutrophilic Inflammation in Acute Lung Injury and Idiopathic Pulmonary Fibrosis via Inhibiting Leukotriene A4 Hydrolase and Blocking LTB4 Biosynthesis

    No full text
    Acute lung injury (ALI) and idiopathic pulmonary fibrosis (IPF) are both serious public health problems with high incidence and mortality rate in adults, and with few drugs available for the efficient treatment in clinic. In this study, we identified that two known histone deacetylase (HDAC) inhibitors, suberanilohydroxamic acid (SAHA, <b>1</b>) and its analogue 4-(dimethylamino)-<i>N</i>-[7-(hydroxyamino)-7-oxoheptyl]­benzamide (<b>2</b>), are effective inhibitors of Leukotriene A4 hydrolase (LTA4H), a key enzyme in the biosynthesis of leukotriene B4 (LTB4), across a panel of 18 HDAC inhibitors, using enzymatic assay, thermofluor assay, and X-ray crystallographic investigation. Importantly, both <b>1</b> and <b>2</b> markedly diminish early neutrophilic inflammation in mouse models of ALI and IPF under a clinical safety dose. Detailed mechanisms of down-regulation of proinflammatory cytokines by <b>1</b> or <b>2</b> were determined <i>in vivo</i>. Collectively, <b>1</b> and <b>2</b> would provide promising agents with well-known clinical safety for potential treatment in patients with ALI and IPF via pharmacologically inhibiting LAT4H and blocking LTB4 biosynthesis
    corecore