This article discusses the use of phonetic features in automatic speech recognition. The phonetic features are derived from acoustic parameters by means of Kohonen networks. Behind the use of phonetic features instead of standard acoustic parameters lies the assumption that it is useful to help the system to focus on linguistically relevant signal properties. Previous experiments using very simple hidden Markov models to represent the phones (with only one mixture for each state and without a lexicon or language model) have indeed shown that the phoneme identification rates on the basis of phonetic features were considerably higher than on the basis of acoustic parameters. When eight mixtures per state are used in hidden Markov modelling, the phoneme identification rates for three different sets of phonetic features were found to be lower than those obtained from a system in which the acoustic parameters are modelled directly. It is suggested that the results are still good enough, however, to further explore the use of phonetic features in a complete automatic speech recognition system: if each phone sequence representing a word in the lexicon is replaced by a sequence of underspecified phonetic feature vectors, the use of phonetic features in the acoustic decoding may have certain advantages
To submit an update or takedown request for this paper, please submit an Update/Correction/Removal Request.