10 research outputs found
Adversarial Data Augmentation Using VAE-GAN for Disordered Speech Recognition
Automatic recognition of disordered speech remains a highly challenging task
to date. The underlying neuro-motor conditions, often compounded with
co-occurring physical disabilities, lead to the difficulty in collecting large
quantities of impaired speech required for ASR system development. This paper
presents novel variational auto-encoder generative adversarial network
(VAE-GAN) based personalized disordered speech augmentation approaches that
simultaneously learn to encode, generate and discriminate synthesized impaired
speech. Separate latent features are derived to learn dysarthric speech
characteristics and phoneme context representations. Self-supervised
pre-trained Wav2vec 2.0 embedding features are also incorporated. Experiments
conducted on the UASpeech corpus suggest the proposed adversarial data
augmentation approach consistently outperformed the baseline speed perturbation
and non-VAE GAN augmentation methods with trained hybrid TDNN and End-to-end
Conformer systems. After LHUC speaker adaptation, the best system using VAE-GAN
based augmentation produced an overall WER of 27.78% on the UASpeech test set
of 16 dysarthric speakers, and the lowest published WER of 57.31% on the subset
of speakers with "Very Low" intelligibility.Comment: Submitted to ICASSP 202
Hyper-parameter Adaptation of Conformer ASR Systems for Elderly and Dysarthric Speech Recognition
Automatic recognition of disordered and elderly speech remains highly
challenging tasks to date due to data scarcity. Parameter fine-tuning is often
used to exploit the large quantities of non-aged and healthy speech pre-trained
models, while neural architecture hyper-parameters are set using expert
knowledge and remain unchanged. This paper investigates hyper-parameter
adaptation for Conformer ASR systems that are pre-trained on the Librispeech
corpus before being domain adapted to the DementiaBank elderly and UASpeech
dysarthric speech datasets. Experimental results suggest that hyper-parameter
adaptation produced word error rate (WER) reductions of 0.45% and 0.67% over
parameter-only fine-tuning on DBank and UASpeech tasks respectively. An
intuitive correlation is found between the performance improvements by
hyper-parameter domain adaptation and the relative utterance length ratio
between the source and target domain data.Comment: 5 pages, 3 figures, 3 tables, accepted by Interspeech202
Confidence Score Based Speaker Adaptation of Conformer Speech Recognition Systems
Speaker adaptation techniques provide a powerful solution to customise
automatic speech recognition (ASR) systems for individual users. Practical
application of unsupervised model-based speaker adaptation techniques to data
intensive end-to-end ASR systems is hindered by the scarcity of speaker-level
data and performance sensitivity to transcription errors. To address these
issues, a set of compact and data efficient speaker-dependent (SD) parameter
representations are used to facilitate both speaker adaptive training and
test-time unsupervised speaker adaptation of state-of-the-art Conformer ASR
systems. The sensitivity to supervision quality is reduced using a confidence
score-based selection of the less erroneous subset of speaker-level adaptation
data. Two lightweight confidence score estimation modules are proposed to
produce more reliable confidence scores. The data sparsity issue, which is
exacerbated by data selection, is addressed by modelling the SD parameter
uncertainty using Bayesian learning. Experiments on the benchmark 300-hour
Switchboard and the 233-hour AMI datasets suggest that the proposed confidence
score-based adaptation schemes consistently outperformed the baseline
speaker-independent (SI) Conformer model and conventional non-Bayesian, point
estimate-based adaptation using no speaker data selection. Similar consistent
performance improvements were retained after external Transformer and LSTM
language model rescoring. In particular, on the 300-hour Switchboard corpus,
statistically significant WER reductions of 1.0%, 1.3%, and 1.4% absolute
(9.5%, 10.9%, and 11.3% relative) were obtained over the baseline SI Conformer
on the NIST Hub5'00, RT02, and RT03 evaluation sets respectively. Similar WER
reductions of 2.7% and 3.3% absolute (8.9% and 10.2% relative) were also
obtained on the AMI development and evaluation sets.Comment: IEEE/ACM Transactions on Audio, Speech, and Language Processin
Audio-visual End-to-end Multi-channel Speech Separation, Dereverberation and Recognition
Accurate recognition of cocktail party speech containing overlapping
speakers, noise and reverberation remains a highly challenging task to date.
Motivated by the invariance of visual modality to acoustic signal corruption,
an audio-visual multi-channel speech separation, dereverberation and
recognition approach featuring a full incorporation of visual information into
all system components is proposed in this paper. The efficacy of the video
input is consistently demonstrated in mask-based MVDR speech separation,
DNN-WPE or spectral mapping (SpecM) based speech dereverberation front-end and
Conformer ASR back-end. Audio-visual integrated front-end architectures
performing speech separation and dereverberation in a pipelined or joint
fashion via mask-based WPD are investigated. The error cost mismatch between
the speech enhancement front-end and ASR back-end components is minimized by
end-to-end jointly fine-tuning using either the ASR cost function alone, or its
interpolation with the speech enhancement loss. Experiments were conducted on
the mixture overlapped and reverberant speech data constructed using simulation
or replay of the Oxford LRS2 dataset. The proposed audio-visual multi-channel
speech separation, dereverberation and recognition systems consistently
outperformed the comparable audio-only baseline by 9.1% and 6.2% absolute
(41.7% and 36.0% relative) word error rate (WER) reductions. Consistent speech
enhancement improvements were also obtained on PESQ, STOI and SRMR scores.Comment: IEEE/ACM Transactions on Audio, Speech, and Language Processin
Exploring Self-supervised Pre-trained ASR Models For Dysarthric and Elderly Speech Recognition
Automatic recognition of disordered and elderly speech remains a highly
challenging task to date due to the difficulty in collecting such data in large
quantities. This paper explores a series of approaches to integrate domain
adapted SSL pre-trained models into TDNN and Conformer ASR systems for
dysarthric and elderly speech recognition: a) input feature fusion between
standard acoustic frontends and domain adapted wav2vec2.0 speech
representations; b) frame-level joint decoding of TDNN systems separately
trained using standard acoustic features alone and with additional wav2vec2.0
features; and c) multi-pass decoding involving the TDNN/Conformer system
outputs to be rescored using domain adapted wav2vec2.0 models. In addition,
domain adapted wav2vec2.0 representations are utilized in
acoustic-to-articulatory (A2A) inversion to construct multi-modal dysarthric
and elderly speech recognition systems. Experiments conducted on the UASpeech
dysarthric and DementiaBank Pitt elderly speech corpora suggest TDNN and
Conformer ASR systems integrated domain adapted wav2vec2.0 models consistently
outperform the standalone wav2vec2.0 models by statistically significant WER
reductions of 8.22% and 3.43% absolute (26.71% and 15.88% relative) on the two
tasks respectively. The lowest published WERs of 22.56% (52.53% on very low
intelligibility, 39.09% on unseen words) and 18.17% are obtained on the
UASpeech test set of 16 dysarthric speakers, and the DementiaBank Pitt test set
respectively.Comment: accepted by ICASSP 202
On the Use of Hybrid CFIE-EFIE for Objects Containing Closed-Open Surface Junctions
To effectively solve the electromagnetic scattering or radiation properties from the perfect electric conductor (PEC) objects containing closed-open surface junctions, how to establish the hybrid combined field integral equation-electric field integral equation (CFIE-EFIE) is studied, which is different with the existing scheme for the objects where the closed and open parts are separate. Further, it is found that when the integral equation is solved using the method of moments (MoM), if the widely used RWG basis functions are employed to expand the induced surface current, the CFIE-EFIE may give inaccurate numerical results for the objects containing fine structures. The numerical accuracy can be improved by introducing the linear-linear (LL) basis functions. Moreover, to pursue a high computational efficiency, the LL and RWG basis functions are simultaneously used to expand the current on the fine structures and other relatively smooth surfaces respectively, whose validity is verified by numerical results
A wellâconditioned integral equation for electromagnetic scattering from composite inhomogeneous biâanisotropic material and closed perfect electric conductor objects
A wellâconditioned volumeâsurface integral equation, called the volume integral equationâ combined field integral equation, is applied to analyse electromagnetic (EM) scattering from arbitrarily shaped threeâdimensional composite objects comprising both inhomogeneous biâanisotropic material and closed perfect electric conductors (PECs). The equivalent surface and volume currents are respectively expanded using the commonly used RWG and SWG basis functions, while a matrix equation is derived by the method of moments. Because the magnetic field integral equation is involved in modelling the surface electric current, and the constitutive parameters are all tensors, some new kinds of singularities are encountered and properly handled in the filling process of the impedance matrix. Several numerical results of EM scattering from composite biâanisotropy and closed PEC objects are shown to illustrate the accuracy and efficiency of the proposed scheme. The validity of the continuity condition of electric flux enforced on the biâanisotropyâPEC interfaces, which can be used to eliminate the volumetric electric unknowns, is also verified.This is the published version of the following article: Liu, Jinbo, Jin Yuan, Zengrui Li, and Jiming Song. "A wellâconditioned integral equation for electromagnetic scattering from composite inhomogeneous biâanisotropic material and closed perfect electric conductor objects." IET Microwaves, Antennas & Propagation (2021). DOI: 10.1049/mia2.12051. Posted with permission.</p
Multiservice-Based Traffic Scheduling for 5G Access Traffic Steering, Switching and Splitting
As a key enabler of the access traffic steering, switching and splitting (ATSSS) feature, multipath transport can leverage the simultaneous use of several network paths and support seamless failover to improve both communication throughput and resilience. Therefore, a traffic scheduling strategy is necessary to determine the best network path combination that may improve the performance of multipath transport. To address this need, we developed a multiservice-type based transmission (MSTT) traffic scheduling optimization strategy, which involves three steps. First, the user equipment (UE) selects the number of data stream transmission paths, considering the service utility function, and either transmits all data streams via the 3GPP network or sends two streams, one via the 3GPP network and the other via the non-3GPP network. Second, the proposed method is used to select the transmission path for each data stream based on load balancing. Finally, an algorithm for optimizing traffic scheduling is formulated by applying the convex optimization problem to maximize the effective network capacity under a Delay Quality of Service (DQoS) constraint. The proposed traffic scheduling strategy is validated through simulation experiments. The results indicate that user satisfaction and effective capacity realized are always better than when using the always-best-connected and fixed-ratio power-allocation algorithms
On the Use of Hybrid CFIE-EFIE for Objects Containing Closed-Open Surface Junctions
To effectively solve the electromagnetic scattering or radiation properties from the perfect electric conductor (PEC) objects containing closed-open surface junctions, how to establish the hybrid combined field integral equation-electric field integral equation (CFIE-EFIE) is studied, which is different with the existing scheme for the objects where the closed and open parts are separate. Further, it is found that when the integral equation is solved using the method of moments (MoM), if the widely used RWG basis functions are employed to expand the induced surface current, the CFIE-EFIE may give inaccurate numerical results for the objects containing fine structures. The numerical accuracy can be improved by introducing the linear-linear (LL) basis functions. Moreover, to pursue a high computational efficiency, the LL and RWG basis functions are simultaneously used to expand the current on the fine structures and other relatively smooth surfaces respectively, whose validity is verified by numerical results.This is a manuscript of an article published as Liu, Jinbo, Jin Yuan, Wen Luo, Zengrui Li, and Jiming Song. "On the Use of Hybrid CFIE-EFIE for Objects Containing Closed-Open Surface Junctions." IEEE Antennas and Wireless Propagation Letters (2021). DOI: 10.1109/LAWP.2021.3077143. Posted with permission.</p
Drug Repurposing of Histone Deacetylase Inhibitors That Alleviate Neutrophilic Inflammation in Acute Lung Injury and Idiopathic Pulmonary Fibrosis via Inhibiting Leukotriene A4 Hydrolase and Blocking LTB4 Biosynthesis
Acute
lung injury (ALI) and idiopathic pulmonary fibrosis (IPF)
are both serious public health problems with high incidence and mortality
rate in adults, and with few drugs available for the efficient treatment
in clinic. In this study, we identified that two known histone deacetylase
(HDAC) inhibitors, suberanilohydroxamic acid (SAHA, <b>1</b>) and its analogue 4-(dimethylamino)-<i>N</i>-[7-(hydroxyamino)-7-oxoheptyl]Âbenzamide
(<b>2</b>), are effective inhibitors of Leukotriene A4 hydrolase
(LTA4H), a key enzyme in the biosynthesis of leukotriene B4 (LTB4),
across a panel of 18 HDAC inhibitors, using enzymatic assay, thermofluor
assay, and X-ray crystallographic investigation. Importantly, both <b>1</b> and <b>2</b> markedly diminish early neutrophilic
inflammation in mouse models of ALI and IPF under a clinical safety
dose. Detailed mechanisms of down-regulation of proinflammatory cytokines
by <b>1</b> or <b>2</b> were determined <i>in vivo</i>. Collectively, <b>1</b> and <b>2</b> would provide promising
agents with well-known clinical safety for potential treatment in
patients with ALI and IPF via pharmacologically inhibiting LAT4H and
blocking LTB4 biosynthesis