938 research outputs found

    Speech Recognition in noisy environment using Deep Learning Neural Network

    Get PDF
    Recent researches in the field of automatic speaker recognition have shown that methods based on deep learning neural networks provide better performance than other statistical classifiers. On the other hand, these methods usually require adjustment of a significant number of parameters. The goal of this thesis is to show that selecting appropriate value of parameters can significantly improve speaker recognition performance of methods based on deep learning neural networks. The reported study introduces an approach to automatic speaker recognition based on deep neural networks and the stochastic gradient descent algorithm. It particularly focuses on three parameters of the stochastic gradient descent algorithm: the learning rate, and the hidden and input layer dropout rates. Additional attention was devoted to the research question of speaker recognition under noisy conditions. Thus, two experiments were conducted in the scope of this thesis. The first experiment was intended to demonstrate that the optimization of the observed parameters of the stochastic gradient descent algorithm can improve speaker recognition performance under no presence of noise. This experiment was conducted in two phases. In the first phase, the recognition rate is observed when the hidden layer dropout rate and the learning rate are varied, while the input layer dropout rate was constant. In the second phase of this experiment, the recognition rate is observed when the input layers dropout rate and learning rate are varied, while the hidden layer dropout rate was constant. The second experiment was intended to show that the optimization of the observed parameters of the stochastic gradient descent algorithm can improve speaker recognition performance even under noisy conditions. Thus, different noise levels were artificially applied on the original speech signal

    Progress in Speech Recognition for Romanian Language

    Get PDF

    Speaker Identification and Spoken word Recognition in Noisy Environment using Different Techniques

    Get PDF
    In this work, an attempt is made to design ASR systems through software/computer programs which would perform Speaker Identification, Spoken word recognition and combination of both speaker identification and Spoken word recognition in general noisy environment. Automatic Speech Recognition system is designed for Limited vocabulary of Telugu language words/control commands. The experiments are conducted to find the better combination of feature extraction technique and classifier model that will perform well in general noisy environment (Home/Office environment where noise is around 15-35 dB). A recently proposed features extraction technique Gammatone frequency coefficients which is reported as the best fit to the human auditory system is chosen for the experiments along with the more common feature extraction techniques MFCC and PLP as part of Front end process (i.e. speech features extraction). Two different Artificial Neural Network classifiers Learning Vector Quantization (LVQ) neural networks and Radial Basis Function (RBF) neural networks along with Hidden Markov Models (HMMs) are chosen for the experiments as part of Back end process (i.e. training/modeling the ASRs). The performance of different ASR systems that are designed by utilizing the 9 different combinations (3 feature extraction techniques and 3 classifier models) are analyzed in terms of spoken word recognition and speaker identification accuracy success rate, design time of ASRs, and recognition / identification response time .The testing speech samples are recorded in general noisy conditions i.e.in the existence of air conditioning noise, fan noise, computer key board noise and far away cross talk noise. ASR systems designed and analyzed programmatically in MATLAB 2013(a) Environment

    A comparative review of dynamic neural networks and hidden Markov model methods for mobile on-device speech recognition

    Get PDF
    The adoption of high-accuracy speech recognition algorithms without an effective evaluation of their impact on the target computational resource is impractical for mobile and embedded systems. In this paper, techniques are adopted to minimise the required computational resource for an effective mobile-based speech recognition system. A Dynamic Multi-Layer Perceptron speech recognition technique, capable of running in real time on a state-of-the-art mobile device, has been introduced. Even though a conventional hidden Markov model when applied to the same dataset slightly outperformed our approach, its processing time is much higher. The Dynamic Multi-layer Perceptron presented here has an accuracy level of 96.94% and runs significantly faster than similar techniques
    corecore