1,245 research outputs found
Attentive Adversarial Learning for Domain-Invariant Training
Adversarial domain-invariant training (ADIT) proves to be effective in
suppressing the effects of domain variability in acoustic modeling and has led
to improved performance in automatic speech recognition (ASR). In ADIT, an
auxiliary domain classifier takes in equally-weighted deep features from a deep
neural network (DNN) acoustic model and is trained to improve their
domain-invariance by optimizing an adversarial loss function. In this work, we
propose an attentive ADIT (AADIT) in which we advance the domain classifier
with an attention mechanism to automatically weight the input deep features
according to their importance in domain classification. With this attentive
re-weighting, AADIT can focus on the domain normalization of phonetic
components that are more susceptible to domain variability and generates deep
features with improved domain-invariance and senone-discriminativity over ADIT.
Most importantly, the attention block serves only as an external component to
the DNN acoustic model and is not involved in ASR, so AADIT can be used to
improve the acoustic modeling with any DNN architectures. More generally, the
same methodology can improve any adversarial learning system with an auxiliary
discriminator. Evaluated on CHiME-3 dataset, the AADIT achieves 13.6% and 9.3%
relative WER improvements, respectively, over a multi-conditional model and a
strong ADIT baseline.Comment: 5 pages, 1 figure, ICASSP 201
Multilingual Adaptation of RNN Based ASR Systems
In this work, we focus on multilingual systems based on recurrent neural
networks (RNNs), trained using the Connectionist Temporal Classification (CTC)
loss function. Using a multilingual set of acoustic units poses difficulties.
To address this issue, we proposed Language Feature Vectors (LFVs) to train
language adaptive multilingual systems. Language adaptation, in contrast to
speaker adaptation, needs to be applied not only on the feature level, but also
to deeper layers of the network. In this work, we therefore extended our
previous approach by introducing a novel technique which we call "modulation".
Based on this method, we modulated the hidden layers of RNNs using LFVs. We
evaluated this approach in both full and low resource conditions, as well as
for grapheme and phone based systems. Lower error rates throughout the
different conditions could be achieved by the use of the modulation.Comment: 5 pages, 1 figure, to appear in 2018 IEEE International Conference on
Acoustics, Speech and Signal Processing (ICASSP 2018
Speech Dereverberation Based on Integrated Deep and Ensemble Learning Algorithm
Reverberation, which is generally caused by sound reflections from walls,
ceilings, and floors, can result in severe performance degradation of acoustic
applications. Due to a complicated combination of attenuation and time-delay
effects, the reverberation property is difficult to characterize, and it
remains a challenging task to effectively retrieve the anechoic speech signals
from reverberation ones. In the present study, we proposed a novel integrated
deep and ensemble learning algorithm (IDEA) for speech dereverberation. The
IDEA consists of offline and online phases. In the offline phase, we train
multiple dereverberation models, each aiming to precisely dereverb speech
signals in a particular acoustic environment; then a unified fusion function is
estimated that aims to integrate the information of multiple dereverberation
models. In the online phase, an input utterance is first processed by each of
the dereverberation models. The outputs of all models are integrated
accordingly to generate the final anechoic signal. We evaluated the IDEA on
designed acoustic environments, including both matched and mismatched
conditions of the training and testing data. Experimental results confirm that
the proposed IDEA outperforms single deep-neural-network-based dereverberation
model with the same model architecture and training data
- …