thesis

Investigation into zero-crossing techniques as a viable means of speech recognition

Abstract

The idea behind this research is to demonstrate how a fundamental characteristic of speech (zero-crossing information) may be exploited in the development of a low cost, highly effective speech recognition system. The system is to be used to recognise a small vocabulary of isolated speech. Although intended to be speaker dependent, the system is also tested for speaker independence. A brief description of how speech is produced and recognised by a human subject is first presented. Following this, some features of both voiced and unvoiced speech signals and their associated spectra are discussed in relation to zero-crossing information. Phonemes and their segmentation (using zero-crossing data or otherwise) are also examined. A brief discussion of stationarity and its effects on zero-crossings is then given. The choice of pre-processing filters is also mentioned. Two methods of speech recognition implementing zero-crossing information are then discussed. The first technique studied analyses the ‘spacing’ between zero-crossings, producing a signal whose amplitude is proportional to the distance between successive crossings. The possibility of this system, (termed Sinusoidal Instantaneous Frequency Extractor (SIFE) [14]), producing effective recognition parameters is examined. A second analysis technique, called Higher Order Crossing Analysis (HOC) [25], is then introduced. This method extracts higher order zero-crossing information from the signal using various filtering techniques and uses this data to recognise the speech signal. Modified versions of both methods were developed, tested and found to be more effective and adaptable than their predecessors. A new parameter (Columnised Higher Order Crossing (CHOC)) was developed and found to be more effective than HOC. Dynamic Time Warping was then implemented to pattern match CHOC templates with CHOC test signals, enabling a percentage success rate for the CHOC system to be achieved (-90% ). Finally, a comparison of the two systems is then made and a discussion about their effectiveness is given

    Similar works