1,111 research outputs found
Light Gated Recurrent Units for Speech Recognition
A field that has directly benefited from the recent advances in deep learning
is Automatic Speech Recognition (ASR). Despite the great achievements of the
past decades, however, a natural and robust human-machine speech interaction
still appears to be out of reach, especially in challenging environments
characterized by significant noise and reverberation. To improve robustness,
modern speech recognizers often employ acoustic models based on Recurrent
Neural Networks (RNNs), that are naturally able to exploit large time contexts
and long-term speech modulations. It is thus of great interest to continue the
study of proper techniques for improving the effectiveness of RNNs in
processing speech signals.
In this paper, we revise one of the most popular RNN models, namely Gated
Recurrent Units (GRUs), and propose a simplified architecture that turned out
to be very effective for ASR. The contribution of this work is two-fold: First,
we analyze the role played by the reset gate, showing that a significant
redundancy with the update gate occurs. As a result, we propose to remove the
former from the GRU design, leading to a more efficient and compact single-gate
model. Second, we propose to replace hyperbolic tangent with ReLU activations.
This variation couples well with batch normalization and could help the model
learn long-term dependencies without numerical issues.
Results show that the proposed architecture, called Light GRU (Li-GRU), not
only reduces the per-epoch training time by more than 30% over a standard GRU,
but also consistently improves the recognition accuracy across different tasks,
input features, noisy conditions, as well as across different ASR paradigms,
ranging from standard DNN-HMM speech recognizers to end-to-end CTC models.Comment: Copyright 2018 IEE
Adaptive Hidden Markov Noise Modelling for Speech Enhancement
A robust and reliable noise estimation algorithm is required in many speech enhancement
systems. The aim of this thesis is to propose and evaluate a robust noise estimation
algorithm for highly non-stationary noisy environments. In this work, we model the
non-stationary noise using a set of discrete states with each state representing a distinct
noise power spectrum. In this approach, the state sequence over time is conveniently
represented by a Hidden Markov Model (HMM).
In this thesis, we first present an online HMM re-estimation framework that models
time-varying noise using a Hidden Markov Model and tracks changes in noise characteristics
by a sequential model update procedure that tracks the noise characteristics
during the absence of speech. In addition the algorithm will when necessary create new
model states to represent novel noise spectra and will merge existing states that have similar
characteristics. We then extend our work in robust noise estimation during speech
activity by incorporating a speech model into our existing noise model. The noise characteristics
within each state are updated based on a speech presence probability which
is derived from a modified Minima controlled recursive averaging method.
We have demonstrated the effectiveness of our noise HMM in tracking both stationary
and highly non-stationary noise, and shown that it gives improved performance over
other conventional noise estimation methods when it is incorporated into a standard
speech enhancement algorithm
A novel and robust parameter training approach for HMMs under noisy and partial access to states
Cataloged from PDF version of article.This paper proposes a new estimation algorithm for the parameters of an HMM as to best account for the observed data. In this model, in addition to the observation sequence, we have partial and noisy access to the hidden state sequence as side information. This access can be seen as "partial labeling" of the hidden states. Furthermore, we model possible mislabeling in the side information in a joint framework and derive the corresponding EM updates accordingly. In our simulations, we observe that using this side information, we considerably improve the state recognition performance, up to 70%, with respect to the "achievable margin" defined by the baseline algorithms. Moreover, our algorithm is shown to be robust to the training conditions. (C) 2013 Elsevier B.V. All rights reserved
- …