Location of Repository

A comparison of model and transform-based visual features for audio-visual LVCSR

By Iain Matthews, Gerasimos Potamianos, Chalapathy Neti and Juergen Luettin

Abstract

Four different visual speech parameterisation methods are compared on a large vocabulary, continuous, audio-visual speech recognition task using the IBM ViaVoice TM audio-visual speech database. Three are direct mouth image region based transforms; discrete cosine and wavelet transforms, and principal component analysis. The fourth uses a statistical model of shape and appearance called an active appearance model, to track and obtain model parameters describing the entire face. All parameterisations are compared experimentally using hidden Markov models (HMM’s) in a speaker independent test. Visualonly HMM’s are used to rescore lattices obtained from audio models trained in noisy conditions. 1

Year: 2001
OAI identifier: oai:CiteSeerX.psu:10.1.1.161.4156
Provided by: CiteSeerX
Download PDF:
Sorry, we are unable to provide the full text but you may find it at the following location(s):
  • http://citeseerx.ist.psu.edu/v... (external link)
  • http://www.doc.govt.nz/upload/... (external link)
  • Suggested articles


    To submit an update or takedown request for this paper, please submit an Update/Correction/Removal Request.