19 research outputs found

    Learning a Hybrid Architecture for Sequence Regression and Annotation

    Full text link
    When learning a hidden Markov model (HMM), sequen- tial observations can often be complemented by real-valued summary response variables generated from the path of hid- den states. Such settings arise in numerous domains, includ- ing many applications in biology, like motif discovery and genome annotation. In this paper, we present a flexible frame- work for jointly modeling both latent sequence features and the functional mapping that relates the summary response variables to the hidden state sequence. The algorithm is com- patible with a rich set of mapping functions. Results show that the availability of additional continuous response vari- ables can simultaneously improve the annotation of the se- quential observations and yield good prediction performance in both synthetic data and real-world datasets.Comment: AAAI 201

    Very Deep Convolutional Neural Networks for Robust Speech Recognition

    Full text link
    This paper describes the extension and optimization of our previous work on very deep convolutional neural networks (CNNs) for effective recognition of noisy speech in the Aurora 4 task. The appropriate number of convolutional layers, the sizes of the filters, pooling operations and input feature maps are all modified: the filter and pooling sizes are reduced and dimensions of input feature maps are extended to allow adding more convolutional layers. Furthermore appropriate input padding and input feature map selection strategies are developed. In addition, an adaptation framework using joint training of very deep CNN with auxiliary features i-vector and fMLLR features is developed. These modifications give substantial word error rate reductions over the standard CNN used as baseline. Finally the very deep CNN is combined with an LSTM-RNN acoustic model and it is shown that state-level weighted log likelihood score combination in a joint acoustic model decoding scheme is very effective. On the Aurora 4 task, the very deep CNN achieves a WER of 8.81%, further 7.99% with auxiliary feature joint training, and 7.09% with LSTM-RNN joint decoding.Comment: accepted by SLT 201

    Recent Trends in Application of Neural Networks to Speech Recognition

    Get PDF
    : In this paper, we review the research work that deal with neural network based speech recognition and the various approaches they take to bring in accuracy. Three approaches of speech recognition using neural network learning models are discussed: (1) Deep Neural Network(DNN) - Hidden Markov Model(HMM), (2) Recurrent Neural Networks(RNN) and (3) Long Short Term Memory(LSTM). It also discusses how for a given application one model is better suited than the other and when should one prefer one model over another.A pre-trained Deep Neural Network - Hidden Markov Model hybrid architecture trains the DNN to produce a distribution over tied triphone states as its output. The DNN pre-training algorithm is a robust and often a helpful way to initialize deep neural networks generatively that can aid in optimization and reduce generalization error. Combining recurrent neural nets and HMM results in a highly discriminative system with warping capabilities. To evaluate the impact of recurrent connections we compare the train and test characteristic error rates of DNN, Recurrent Dynamic Neural Networks (RDNN), and Bi-Directional Deep Neural Network (BRDNN) models while roughly controlling for the total number of free parameters in the model. Both variants of recurrent models show substantial test set characteristic error rate improvements over the non-recurrent DNN model. Inspired from the discussion about how to construct deep RNNs, several alternative architectures were constructed for deep LSTM networks from three points: (1) input-to-hidden function, (2) hidden-to-hidden transition and (3) hidden-to-output function. Furthermore, some deeper variants of LSTMs were also designed by combining different points