38,254 research outputs found

    Recent Trends in Application of Neural Networks to Speech Recognition

    Get PDF
    : In this paper, we review the research work that deal with neural network based speech recognition and the various approaches they take to bring in accuracy. Three approaches of speech recognition using neural network learning models are discussed: (1) Deep Neural Network(DNN) - Hidden Markov Model(HMM), (2) Recurrent Neural Networks(RNN) and (3) Long Short Term Memory(LSTM). It also discusses how for a given application one model is better suited than the other and when should one prefer one model over another.A pre-trained Deep Neural Network - Hidden Markov Model hybrid architecture trains the DNN to produce a distribution over tied triphone states as its output. The DNN pre-training algorithm is a robust and often a helpful way to initialize deep neural networks generatively that can aid in optimization and reduce generalization error. Combining recurrent neural nets and HMM results in a highly discriminative system with warping capabilities. To evaluate the impact of recurrent connections we compare the train and test characteristic error rates of DNN, Recurrent Dynamic Neural Networks (RDNN), and Bi-Directional Deep Neural Network (BRDNN) models while roughly controlling for the total number of free parameters in the model. Both variants of recurrent models show substantial test set characteristic error rate improvements over the non-recurrent DNN model. Inspired from the discussion about how to construct deep RNNs, several alternative architectures were constructed for deep LSTM networks from three points: (1) input-to-hidden function, (2) hidden-to-hidden transition and (3) hidden-to-output function. Furthermore, some deeper variants of LSTMs were also designed by combining different points

    Analyzing Hidden Representations in End-to-End Automatic Speech Recognition Systems

    Full text link
    Neural models have become ubiquitous in automatic speech recognition systems. While neural networks are typically used as acoustic models in more complex systems, recent studies have explored end-to-end speech recognition systems based on neural networks, which can be trained to directly predict text from input acoustic features. Although such systems are conceptually elegant and simpler than traditional systems, it is less obvious how to interpret the trained models. In this work, we analyze the speech representations learned by a deep end-to-end model that is based on convolutional and recurrent layers, and trained with a connectionist temporal classification (CTC) loss. We use a pre-trained model to generate frame-level features which are given to a classifier that is trained on frame classification into phones. We evaluate representations from different layers of the deep model and compare their quality for predicting phone labels. Our experiments shed light on important aspects of the end-to-end model such as layer depth, model complexity, and other design choices.Comment: NIPS 201

    Recurrent Neural Networks for End-to-End Speech Recognition: A Comparative Analysis

    Get PDF
    Speech Recognition is correctly transcribing the spoken utterances by the machine. A new area that is emerging for the representation of the sequential data, such as Speech Recognition is Deep Learning. Deep Learning frameworks such as Recurrent Neural Networks(RNNs) were successful in replacing the traditional speech models such as Hidden Markov Model and Gaussian mixtures. These frameworks boosted the recognition performances to a large context. RNNs being used for sequence to sequence modeling, is a powerful tool for sequence labeling. End-to-End methods such as Connectionist Temporal Classification(CTC) is used with RNNs for Speech Recognition. This paper represents a comparative analysis of RNNs with End-to-End Speech Recognition. Models are trained with different RNN architectures such as Simple RNN cells(SRNN), Long Short Term Memory(LSTMs), Gated Recurrent Unit(GRUs) and even a bidirectional RNNs using all these is compared on Librispeech corpse

    Deep Reinforcement Learning: An Overview

    Full text link
    In recent years, a specific machine learning method called deep learning has gained huge attraction, as it has obtained astonishing results in broad applications such as pattern recognition, speech recognition, computer vision, and natural language processing. Recent research has also been shown that deep learning techniques can be combined with reinforcement learning methods to learn useful representations for the problems with high dimensional raw data input. This chapter reviews the recent advances in deep reinforcement learning with a focus on the most used deep architectures such as autoencoders, convolutional neural networks and recurrent neural networks which have successfully been come together with the reinforcement learning framework.Comment: Proceedings of SAI Intelligent Systems Conference (IntelliSys) 201

    Dysarthric Speech Recognition and Offline Handwriting Recognition using Deep Neural Networks

    Get PDF
    Millions of people around the world are diagnosed with neurological disorders like Parkinson’s, Cerebral Palsy or Amyotrophic Lateral Sclerosis. Due to the neurological damage as the disease progresses, the person suffering from the disease loses control of muscles, along with speech deterioration. Speech deterioration is due to neuro motor condition that limits manipulation of the articulators of the vocal tract, the condition collectively called as dysarthria. Even though dysarthric speech is grammatically and syntactically correct, it is difficult for humans to understand and for Automatic Speech Recognition (ASR) systems to decipher. With the emergence of deep learning, speech recognition systems have improved a lot compared to traditional speech recognition systems, which use sophisticated preprocessing techniques to extract speech features. In this digital era there are still many documents that are handwritten many of which need to be digitized. Offline handwriting recognition involves recognizing handwritten characters from images of handwritten text (i.e. scanned documents). This is an interesting task as it involves sequence learning with computer vision. The task is more difficult than Optical Character Recognition (OCR), because handwritten letters can be written in virtually infinite different styles. This thesis proposes exploiting deep learning techniques like Convolutional Neural Networks (CNN) and Recurrent Neural Networks (RNN) for offline handwriting recognition. For speech recognition, we compare traditional methods for speech recognition with recent deep learning methods. Also, we apply speaker adaptation methods both at feature level and at parameter level to improve recognition of dysarthric speech
    corecore