920 research outputs found
Digit recognition using neural networks
This paper investigates the use of feed-forward multi-layer perceptrons trained by back-propagation in speech recognition. Besides this, the paper also proposes an automatic technique for both training and recognition. The use of neural networks for speaker independent isolated word recognition on small vocabularies is studied and an automated system from the training stage to the recognition stage without the need of manual cropping for speech signals is developed to evaluate the performance of the automatic speech recognition (ASR) system. Linear predictive coding (LPC) has been applied to represent speech signal in frames in early stage. Features from the selected frames are used to train multilayer perceptrons (MLP) using back-propagation. The same routine is applied to the speech signal during the recognition stage and unknown test patterns are classified to the nearest patterns. In short, the selected frames represent the local features of the speech signal and all of them contribute to the global similarity for the whole speech signal. The analysis, design and development of the automation system are done in MATLAB, in which an isolated word speaker independent digits recogniser is developed
Speaker Independent Speech Recognition Using Neural Network
In spite of the advances accomplished throughout the last few decades, automatic
speech recognition (ASR) is still a challenging and difficult task when the systems
are applied in the real world. Different requirements for various applications drive
the researchers to explore for more effective ways in the particular application.
Attempts to apply artificial neural networks (ANN) as a classification tool are
proposed to increase the reliability of the system. This project studies the approach of
using neural network for speaker independent isolated word recognition on small
vocabularies and proposes a method to have a simple MLP as speech recognizer. Our
approach is able to overcome the current limitations of MLP in the selection of input
buffersâ size by proposing a method on frames selection. Linear predictive coding
(LPC) has been applied to represent speech signal in frames in early stage. Features
from the selected frames are used to train the multilayer perceptrons (MLP) feedforward
back-propagation (FFBP) neural network during the training stage. Same
routine has been applied to the speech signal during the recognition stage and the
unknown test pattern will be classified to one of the nearest pattern. In short, the
selected frames represent the local features of the speech signal and all of them
contribute to the global similarity for the whole speech signal. The analysis, design
and the PC based voice dialling system is developed using MATLABÂź
An Efficient Hidden Markov Model for Offline Handwritten Numeral Recognition
Traditionally, the performance of ocr algorithms and systems is based on the
recognition of isolated characters. When a system classifies an individual
character, its output is typically a character label or a reject marker that
corresponds to an unrecognized character. By comparing output labels with the
correct labels, the number of correct recognition, substitution errors
misrecognized characters, and rejects unrecognized characters are determined.
Nowadays, although recognition of printed isolated characters is performed with
high accuracy, recognition of handwritten characters still remains an open
problem in the research arena. The ability to identify machine printed
characters in an automated or a semi automated manner has obvious applications
in numerous fields. Since creating an algorithm with a one hundred percent
correct recognition rate is quite probably impossible in our world of noise and
different font styles, it is important to design character recognition
algorithms with these failures in mind so that when mistakes are inevitably
made, they will at least be understandable and predictable to the person
working with theComment: 6pages, 5 figure
Continuous speech phoneme recognition using neural networks and grammar correction.
by Wai-Tat Fu.Thesis (M.Phil.)--Chinese University of Hong Kong, 1995.Includes bibliographical references (leaves 104-[109]).Chapter 1 --- INTRODUCTION --- p.1Chapter 1.1 --- Problem of Speech Recognition --- p.1Chapter 1.2 --- Why continuous speech recognition? --- p.5Chapter 1.3 --- Current status of continuous speech recognition --- p.6Chapter 1.4 --- Research Goal --- p.10Chapter 1.5 --- Thesis outline --- p.10Chapter 2 --- Current Approaches to Continuous Speech Recognition --- p.12Chapter 2.1 --- BASIC STEPS FOR CONTINUOUS SPEECH RECOGNITION --- p.12Chapter 2.2 --- THE HIDDEN MARKOV MODEL APPROACH --- p.16Chapter 2.2.1 --- Introduction --- p.16Chapter 2.2.2 --- Segmentation and Pattern Matching --- p.18Chapter 2.2.3 --- Word Formation and Syntactic Processing --- p.22Chapter 2.2.4 --- Discussion --- p.23Chapter 2.3 --- NEURAL NETWORK APPROACH --- p.24Chapter 2.3.1 --- Introduction --- p.24Chapter 2.3.2 --- Segmentation and Pattern Matching --- p.25Chapter 2.3.3 --- Discussion --- p.27Chapter 2.4 --- MLP/HMM HYBRID APPROACH --- p.28Chapter 2.4.1 --- Introduction --- p.28Chapter 2.4.2 --- Architecture of Hybrid MLP/HMM Systems --- p.29Chapter 2.4.3 --- Discussions --- p.30Chapter 2.5 --- SYNTACTIC GRAMMAR --- p.30Chapter 2.5.1 --- Introduction --- p.30Chapter 2.5.2 --- Word formation and Syntactic Processing --- p.31Chapter 2.5.3 --- Discussion --- p.32Chapter 2.6 --- SUMMARY --- p.32Chapter 3 --- Neural Network As Pattern Classifier --- p.34Chapter 3.1 --- INTRODUCTION --- p.34Chapter 3.2 --- TRAINING ALGORITHMS AND TOPOLOGIES --- p.35Chapter 3.2.1 --- Multilayer Perceptrons --- p.35Chapter 3.2.2 --- Recurrent Neural Networks --- p.39Chapter 3.2.3 --- Self-organizing Maps --- p.41Chapter 3.2.4 --- Learning Vector Quantization --- p.43Chapter 3.3 --- EXPERIMENTS --- p.44Chapter 3.3.1 --- The Data Set --- p.44Chapter 3.3.2 --- Preprocessing of the Speech Data --- p.45Chapter 3.3.3 --- The Pattern Classifiers --- p.50Chapter 3.4 --- RESULTS AND DISCUSSIONS --- p.53Chapter 4 --- High Level Context Information --- p.56Chapter 4.1 --- INTRODUCTION --- p.56Chapter 4.2 --- HIDDEN MARKOV MODEL APPROACH --- p.57Chapter 4.3 --- THE DYNAMIC PROGRAMMING APPROACH --- p.59Chapter 4.4 --- THE SYNTACTIC GRAMMAR APPROACH --- p.60Chapter 5 --- Finite State Grammar Network --- p.62Chapter 5.1 --- INTRODUCTION --- p.62Chapter 5.2 --- THE GRAMMAR COMPILATION --- p.63Chapter 5.2.1 --- Introduction --- p.63Chapter 5.2.2 --- K-Tails Clustering Method --- p.66Chapter 5.2.3 --- Inference of finite state grammar --- p.67Chapter 5.2.4 --- Error Correcting Parsing --- p.69Chapter 5.3 --- EXPERIMENT --- p.71Chapter 5.4 --- RESULTS AND DISCUSSIONS --- p.73Chapter 6 --- The Integrated System --- p.81Chapter 6.1 --- INTRODUCTION --- p.81Chapter 6.2 --- POSTPROCESSING OF NEURAL NETWORK OUTPUT --- p.82Chapter 6.2.1 --- Activation Threshold --- p.82Chapter 6.2.2 --- Duration Threshold --- p.85Chapter 6.2.3 --- Merging of Phoneme boundaries --- p.88Chapter 6.3 --- THE ERROR CORRECTING PARSER --- p.90Chapter 6.4 --- RESULTS AND DISCUSSIONS --- p.96Chapter 7 --- Conclusions --- p.101Bibliography --- p.10
End-to-end Phoneme Sequence Recognition using Convolutional Neural Networks
Most phoneme recognition state-of-the-art systems rely on a classical neural
network classifiers, fed with highly tuned features, such as MFCC or PLP
features. Recent advances in ``deep learning'' approaches questioned such
systems, but while some attempts were made with simpler features such as
spectrograms, state-of-the-art systems still rely on MFCCs. This might be
viewed as a kind of failure from deep learning approaches, which are often
claimed to have the ability to train with raw signals, alleviating the need of
hand-crafted features. In this paper, we investigate a convolutional neural
network approach for raw speech signals. While convolutional architectures got
tremendous success in computer vision or text processing, they seem to have
been let down in the past recent years in the speech processing field. We show
that it is possible to learn an end-to-end phoneme sequence classifier system
directly from raw signal, with similar performance on the TIMIT and WSJ
datasets than existing systems based on MFCC, questioning the need of complex
hand-crafted features on large datasets.Comment: NIPS Deep Learning Workshop, 201
- âŠ