878 research outputs found
Automatic voice recognition using traditional and artificial neural network approaches
The main objective of this research is to develop an algorithm for isolated-word recognition. This research is focused on digital signal analysis rather than linguistic analysis of speech. Features extraction is carried out by applying a Linear Predictive Coding (LPC) algorithm with order of 10. Continuous-word and speaker independent recognition will be considered in future study after accomplishing this isolated word research. To examine the similarity between the reference and the training sets, two approaches are explored. The first is implementing traditional pattern recognition techniques where a dynamic time warping algorithm is applied to align the two sets and calculate the probability of matching by measuring the Euclidean distance between the two sets. The second is implementing a backpropagation artificial neural net model with three layers as the pattern classifier. The adaptation rule implemented in this network is the generalized least mean square (LMS) rule. The first approach has been accomplished. A vocabulary of 50 words was selected and tested. The accuracy of the algorithm was found to be around 85 percent. The second approach is in progress at the present time
Recommended from our members
A digital neural network approach to speech recognition
This thesis was submitted for the degree of Doctor of Philosophy and awarded by Brunel University.This thesis presents two novel methods for isolated word speech recognition based on sub-word components. A digital neural network is the fundamental processing strategy in both methods. The first design is based on the 'Separate Segmentation &
Labelling' (SS&L) approach. The spectral data of the input utterance is first segmented into phoneme-like units which are then time normalised by linear time normalisation. The neural network labels the
time-normalised phoneme-like segments 78.36% recognition accuracy is achieved for the phoneme-like unit. In the second design, no time normalisation is required. After segmentation, recognition is performed by classifying the data in a window as it is slid one frame at a time, from the start to the end of of each phoneme-like segment in the utterance. 73.97% recognition accuracy for the phoneme-like unit is achieved in this application. The parameters of the neural net have been optimised for
maximum recognition performance. A segmentation strategy using the sum of the difference in filterbank channel energy over successive spectra produced 80.27% correct segmentation of isolated utterances into phoneme-like units. A linguistic processor based on that of Kashyap & Mittal [84] enables 93.11% and 93.49% word recognition accuracy to be achieved for the SS&L and 'Sliding Window' recognisers respectively. The linguistic processor has been redesigned to make it portable so that it can be easily applied to any phoneme based isolated word speech recogniser.This work is funded by the Ministry of Science & Technology, Government of Pakistan
Development of the Feature Extractor for Speech Recognition
Projecte final de carrera realitzat en col.laboració amb University of MariborWith this diploma work we have attempted to give continuity to the previous work done by
other researchers called, Voice Operating Intelligent Wheelchair – VOIC [1]. A development of
a wheelchair controlled by voice is presented in this work and is designed for physically disabled
people, who cannot control their movements. This work describes basic components of speech
recognition and wheelchair control system.
Going to the grain, a speech recognizer system is comprised of two distinct blocks, a Feature
Extractor and a Recognizer. The present work is targeted at the realization of an adequate
Feature Extractor block which uses a standard LPC Cepstrum coder, which translates the
incoming speech into a trajectory in the LPC Cepstrum feature space, followed by a Self
Organizing Map, which classifies the outcome of the coder in order to produce optimal
trajectory representations of words in reduced dimension feature spaces. Experimental results
indicate that trajectories on such reduced dimension spaces can provide reliable representations
of spoken words. The Recognizer block is left for future researchers.
The main contributions of this work have been the research and approach of a new
technology for development issues and the realization of applications like a voice recorder and
player and a complete Feature Extractor system
Continuous speech recognition with modified learning vector quantization algorithm and two-level DP-matching
PROCEEDINGS OF IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSIN
- …