1,408 research outputs found
Small-footprint highway deep neural networks for speech recognition
State-of-the-art speech recognition systems typically employ neural network
acoustic models. However, compared to Gaussian mixture models, deep neural
network (DNN) based acoustic models often have many more model parameters,
making it challenging for them to be deployed on resource-constrained
platforms, such as mobile devices. In this paper, we study the application of
the recently proposed highway deep neural network (HDNN) for training
small-footprint acoustic models. HDNNs are a depth-gated feedforward neural
network, which include two types of gate functions to facilitate the
information flow through different layers. Our study demonstrates that HDNNs
are more compact than regular DNNs for acoustic modeling, i.e., they can
achieve comparable recognition accuracy with many fewer model parameters.
Furthermore, HDNNs are more controllable than DNNs: the gate functions of an
HDNN can control the behavior of the whole network using a very small number of
model parameters. Finally, we show that HDNNs are more adaptable than DNNs. For
example, simply updating the gate functions using adaptation data can result in
considerable gains in accuracy. We demonstrate these aspects by experiments
using the publicly available AMI corpus, which has around 80 hours of training
data.Comment: 9 pages, 6 figures. Accepted to IEEE/ACM Transactions on Audio,
Speech and Language Processing, 2017. arXiv admin note: text overlap with
arXiv:1608.00892, arXiv:1607.0196
Small-footprint Deep Neural Networks with Highway Connections for Speech Recognition
For speech recognition, deep neural networks (DNNs) have significantly
improved the recognition accuracy in most of benchmark datasets and application
domains. However, compared to the conventional Gaussian mixture models,
DNN-based acoustic models usually have much larger number of model parameters,
making it challenging for their applications in resource constrained platforms,
e.g., mobile devices. In this paper, we study the application of the recently
proposed highway network to train small-footprint DNNs, which are {\it thinner}
and {\it deeper}, and have significantly smaller number of model parameters
compared to conventional DNNs. We investigated this approach on the AMI meeting
speech transcription corpus which has around 70 hours of audio data. The
highway neural networks constantly outperformed their plain DNN counterparts,
and the number of model parameters can be reduced significantly without
sacrificing the recognition accuracy.Comment: 5 pages, 3 figures, fixed typo, accepted by Interspeech 201
Semi-tied Units for Efficient Gating in LSTM and Highway Networks
Gating is a key technique used for integrating information from multiple
sources by long short-term memory (LSTM) models and has recently also been
applied to other models such as the highway network. Although gating is
powerful, it is rather expensive in terms of both computation and storage as
each gating unit uses a separate full weight matrix. This issue can be severe
since several gates can be used together in e.g. an LSTM cell. This paper
proposes a semi-tied unit (STU) approach to solve this efficiency issue, which
uses one shared weight matrix to replace those in all the units in the same
layer. The approach is termed "semi-tied" since extra parameters are used to
separately scale each of the shared output values. These extra scaling factors
are associated with the network activation functions and result in the use of
parameterised sigmoid, hyperbolic tangent, and rectified linear unit functions.
Speech recognition experiments using British English multi-genre broadcast data
showed that using STUs can reduce the calculation and storage cost by a factor
of three for highway networks and four for LSTMs, while giving similar word
error rates to the original models.Comment: To appear in Proc. INTERSPEECH 2018, September 2-6, 2018, Hyderabad,
Indi
- …