1,629 research outputs found
Exploiting Low-dimensional Structures to Enhance DNN Based Acoustic Modeling in Speech Recognition
We propose to model the acoustic space of deep neural network (DNN)
class-conditional posterior probabilities as a union of low-dimensional
subspaces. To that end, the training posteriors are used for dictionary
learning and sparse coding. Sparse representation of the test posteriors using
this dictionary enables projection to the space of training data. Relying on
the fact that the intrinsic dimensions of the posterior subspaces are indeed
very small and the matrix of all posteriors belonging to a class has a very low
rank, we demonstrate how low-dimensional structures enable further enhancement
of the posteriors and rectify the spurious errors due to mismatch conditions.
The enhanced acoustic modeling method leads to improvements in continuous
speech recognition task using hybrid DNN-HMM (hidden Markov model) framework in
both clean and noisy conditions, where upto 15.4% relative reduction in word
error rate (WER) is achieved
Porting concepts from DNNs back to GMMs
Deep neural networks (DNNs) have been shown to outperform Gaussian Mixture Models (GMM) on a variety of speech recognition benchmarks. In this paper we analyze the differences between the DNN and GMM modeling techniques and port the best ideas from the DNN-based modeling to a GMM-based system. By going both deep (multiple layers) and wide (multiple parallel sub-models) and by sharing model parameters, we are able to close the gap between the two modeling techniques on the TIMIT database. Since the 'deep' GMMs retain the maximum-likelihood trained Gaussians as first layer, advanced techniques such as speaker adaptation and model-based noise robustness can be readily incorporated. Regardless of their similarities, the DNNs and the deep GMMs still show a sufficient amount of complementarity to allow effective system combination
TheanoLM - An Extensible Toolkit for Neural Network Language Modeling
We present a new tool for training neural network language models (NNLMs),
scoring sentences, and generating text. The tool has been written using Python
library Theano, which allows researcher to easily extend it and tune any aspect
of the training process. Regardless of the flexibility, Theano is able to
generate extremely fast native code that can utilize a GPU or multiple CPU
cores in order to parallelize the heavy numerical computations. The tool has
been evaluated in difficult Finnish and English conversational speech
recognition tasks, and significant improvement was obtained over our best
back-off n-gram models. The results that we obtained in the Finnish task were
compared to those from existing RNNLM and RWTHLM toolkits, and found to be as
good or better, while training times were an order of magnitude shorter
- …