920 research outputs found

    Digit recognition using neural networks

    Get PDF
    This paper investigates the use of feed-forward multi-layer perceptrons trained by back-propagation in speech recognition. Besides this, the paper also proposes an automatic technique for both training and recognition. The use of neural networks for speaker independent isolated word recognition on small vocabularies is studied and an automated system from the training stage to the recognition stage without the need of manual cropping for speech signals is developed to evaluate the performance of the automatic speech recognition (ASR) system. Linear predictive coding (LPC) has been applied to represent speech signal in frames in early stage. Features from the selected frames are used to train multilayer perceptrons (MLP) using back-propagation. The same routine is applied to the speech signal during the recognition stage and unknown test patterns are classified to the nearest patterns. In short, the selected frames represent the local features of the speech signal and all of them contribute to the global similarity for the whole speech signal. The analysis, design and development of the automation system are done in MATLAB, in which an isolated word speaker independent digits recogniser is developed

    Speaker Independent Speech Recognition Using Neural Network

    Get PDF
    In spite of the advances accomplished throughout the last few decades, automatic speech recognition (ASR) is still a challenging and difficult task when the systems are applied in the real world. Different requirements for various applications drive the researchers to explore for more effective ways in the particular application. Attempts to apply artificial neural networks (ANN) as a classification tool are proposed to increase the reliability of the system. This project studies the approach of using neural network for speaker independent isolated word recognition on small vocabularies and proposes a method to have a simple MLP as speech recognizer. Our approach is able to overcome the current limitations of MLP in the selection of input buffers’ size by proposing a method on frames selection. Linear predictive coding (LPC) has been applied to represent speech signal in frames in early stage. Features from the selected frames are used to train the multilayer perceptrons (MLP) feedforward back-propagation (FFBP) neural network during the training stage. Same routine has been applied to the speech signal during the recognition stage and the unknown test pattern will be classified to one of the nearest pattern. In short, the selected frames represent the local features of the speech signal and all of them contribute to the global similarity for the whole speech signal. The analysis, design and the PC based voice dialling system is developed using MATLAB¼

    An Efficient Hidden Markov Model for Offline Handwritten Numeral Recognition

    Full text link
    Traditionally, the performance of ocr algorithms and systems is based on the recognition of isolated characters. When a system classifies an individual character, its output is typically a character label or a reject marker that corresponds to an unrecognized character. By comparing output labels with the correct labels, the number of correct recognition, substitution errors misrecognized characters, and rejects unrecognized characters are determined. Nowadays, although recognition of printed isolated characters is performed with high accuracy, recognition of handwritten characters still remains an open problem in the research arena. The ability to identify machine printed characters in an automated or a semi automated manner has obvious applications in numerous fields. Since creating an algorithm with a one hundred percent correct recognition rate is quite probably impossible in our world of noise and different font styles, it is important to design character recognition algorithms with these failures in mind so that when mistakes are inevitably made, they will at least be understandable and predictable to the person working with theComment: 6pages, 5 figure

    Progress in Speech Recognition for Romanian Language

    Get PDF

    Continuous speech phoneme recognition using neural networks and grammar correction.

    Get PDF
    by Wai-Tat Fu.Thesis (M.Phil.)--Chinese University of Hong Kong, 1995.Includes bibliographical references (leaves 104-[109]).Chapter 1 --- INTRODUCTION --- p.1Chapter 1.1 --- Problem of Speech Recognition --- p.1Chapter 1.2 --- Why continuous speech recognition? --- p.5Chapter 1.3 --- Current status of continuous speech recognition --- p.6Chapter 1.4 --- Research Goal --- p.10Chapter 1.5 --- Thesis outline --- p.10Chapter 2 --- Current Approaches to Continuous Speech Recognition --- p.12Chapter 2.1 --- BASIC STEPS FOR CONTINUOUS SPEECH RECOGNITION --- p.12Chapter 2.2 --- THE HIDDEN MARKOV MODEL APPROACH --- p.16Chapter 2.2.1 --- Introduction --- p.16Chapter 2.2.2 --- Segmentation and Pattern Matching --- p.18Chapter 2.2.3 --- Word Formation and Syntactic Processing --- p.22Chapter 2.2.4 --- Discussion --- p.23Chapter 2.3 --- NEURAL NETWORK APPROACH --- p.24Chapter 2.3.1 --- Introduction --- p.24Chapter 2.3.2 --- Segmentation and Pattern Matching --- p.25Chapter 2.3.3 --- Discussion --- p.27Chapter 2.4 --- MLP/HMM HYBRID APPROACH --- p.28Chapter 2.4.1 --- Introduction --- p.28Chapter 2.4.2 --- Architecture of Hybrid MLP/HMM Systems --- p.29Chapter 2.4.3 --- Discussions --- p.30Chapter 2.5 --- SYNTACTIC GRAMMAR --- p.30Chapter 2.5.1 --- Introduction --- p.30Chapter 2.5.2 --- Word formation and Syntactic Processing --- p.31Chapter 2.5.3 --- Discussion --- p.32Chapter 2.6 --- SUMMARY --- p.32Chapter 3 --- Neural Network As Pattern Classifier --- p.34Chapter 3.1 --- INTRODUCTION --- p.34Chapter 3.2 --- TRAINING ALGORITHMS AND TOPOLOGIES --- p.35Chapter 3.2.1 --- Multilayer Perceptrons --- p.35Chapter 3.2.2 --- Recurrent Neural Networks --- p.39Chapter 3.2.3 --- Self-organizing Maps --- p.41Chapter 3.2.4 --- Learning Vector Quantization --- p.43Chapter 3.3 --- EXPERIMENTS --- p.44Chapter 3.3.1 --- The Data Set --- p.44Chapter 3.3.2 --- Preprocessing of the Speech Data --- p.45Chapter 3.3.3 --- The Pattern Classifiers --- p.50Chapter 3.4 --- RESULTS AND DISCUSSIONS --- p.53Chapter 4 --- High Level Context Information --- p.56Chapter 4.1 --- INTRODUCTION --- p.56Chapter 4.2 --- HIDDEN MARKOV MODEL APPROACH --- p.57Chapter 4.3 --- THE DYNAMIC PROGRAMMING APPROACH --- p.59Chapter 4.4 --- THE SYNTACTIC GRAMMAR APPROACH --- p.60Chapter 5 --- Finite State Grammar Network --- p.62Chapter 5.1 --- INTRODUCTION --- p.62Chapter 5.2 --- THE GRAMMAR COMPILATION --- p.63Chapter 5.2.1 --- Introduction --- p.63Chapter 5.2.2 --- K-Tails Clustering Method --- p.66Chapter 5.2.3 --- Inference of finite state grammar --- p.67Chapter 5.2.4 --- Error Correcting Parsing --- p.69Chapter 5.3 --- EXPERIMENT --- p.71Chapter 5.4 --- RESULTS AND DISCUSSIONS --- p.73Chapter 6 --- The Integrated System --- p.81Chapter 6.1 --- INTRODUCTION --- p.81Chapter 6.2 --- POSTPROCESSING OF NEURAL NETWORK OUTPUT --- p.82Chapter 6.2.1 --- Activation Threshold --- p.82Chapter 6.2.2 --- Duration Threshold --- p.85Chapter 6.2.3 --- Merging of Phoneme boundaries --- p.88Chapter 6.3 --- THE ERROR CORRECTING PARSER --- p.90Chapter 6.4 --- RESULTS AND DISCUSSIONS --- p.96Chapter 7 --- Conclusions --- p.101Bibliography --- p.10

    End-to-end Phoneme Sequence Recognition using Convolutional Neural Networks

    Get PDF
    Most phoneme recognition state-of-the-art systems rely on a classical neural network classifiers, fed with highly tuned features, such as MFCC or PLP features. Recent advances in ``deep learning'' approaches questioned such systems, but while some attempts were made with simpler features such as spectrograms, state-of-the-art systems still rely on MFCCs. This might be viewed as a kind of failure from deep learning approaches, which are often claimed to have the ability to train with raw signals, alleviating the need of hand-crafted features. In this paper, we investigate a convolutional neural network approach for raw speech signals. While convolutional architectures got tremendous success in computer vision or text processing, they seem to have been let down in the past recent years in the speech processing field. We show that it is possible to learn an end-to-end phoneme sequence classifier system directly from raw signal, with similar performance on the TIMIT and WSJ datasets than existing systems based on MFCC, questioning the need of complex hand-crafted features on large datasets.Comment: NIPS Deep Learning Workshop, 201

    CONNECTIONIST SPEECH RECOGNITION - A Hybrid Approach

    Get PDF

    Evaluation of preprocessors for neural network speaker verification

    Get PDF
    • 

    corecore