458 research outputs found
Attentive Adversarial Learning for Domain-Invariant Training
Adversarial domain-invariant training (ADIT) proves to be effective in
suppressing the effects of domain variability in acoustic modeling and has led
to improved performance in automatic speech recognition (ASR). In ADIT, an
auxiliary domain classifier takes in equally-weighted deep features from a deep
neural network (DNN) acoustic model and is trained to improve their
domain-invariance by optimizing an adversarial loss function. In this work, we
propose an attentive ADIT (AADIT) in which we advance the domain classifier
with an attention mechanism to automatically weight the input deep features
according to their importance in domain classification. With this attentive
re-weighting, AADIT can focus on the domain normalization of phonetic
components that are more susceptible to domain variability and generates deep
features with improved domain-invariance and senone-discriminativity over ADIT.
Most importantly, the attention block serves only as an external component to
the DNN acoustic model and is not involved in ASR, so AADIT can be used to
improve the acoustic modeling with any DNN architectures. More generally, the
same methodology can improve any adversarial learning system with an auxiliary
discriminator. Evaluated on CHiME-3 dataset, the AADIT achieves 13.6% and 9.3%
relative WER improvements, respectively, over a multi-conditional model and a
strong ADIT baseline.Comment: 5 pages, 1 figure, ICASSP 201
Robust Speech Recognition Using Generative Adversarial Networks
This paper describes a general, scalable, end-to-end framework that uses the
generative adversarial network (GAN) objective to enable robust speech
recognition. Encoders trained with the proposed approach enjoy improved
invariance by learning to map noisy audio to the same embedding space as that
of clean audio. Unlike previous methods, the new framework does not rely on
domain expertise or simplifying assumptions as are often needed in signal
processing, and directly encourages robustness in a data-driven way. We show
the new approach improves simulated far-field speech recognition of vanilla
sequence-to-sequence models without specialized front-ends or preprocessing
- …