Skip to main content
Article thumbnail
Location of Repository

Overcoming Asynchrony in Audio-Visual Speech Recognition

By Virginia Estellers

Abstract

Abstract—In this paper we propose two alternatives to overcome the natural asynchrony of modalities in Audio-Visual Speech Recognition. We first investigate the use of asynchronous statistical models based on Dynamic Bayesian Networks with different levels of asynchrony. We show that audio-visual models should consider asynchrony within word boundaries and not at phoneme level. The second approach to the problem includes an additional processing of the features before being used for recognition. The proposed technique aligns the temporal evolution of the audio and video streams in terms of a speechrecognition system and enables the use of simpler statistical models for classification. On both cases we report experiments with the CUAVE database, showing the improvements obtained with the proposed asynchronous model and feature processing technique compared to traditional systems. I

Year: 2013
OAI identifier: oai:CiteSeerX.psu:10.1.1.352.2823
Provided by: CiteSeerX
Download PDF:
Sorry, we are unable to provide the full text but you may find it at the following location(s):
  • http://citeseerx.ist.psu.edu/v... (external link)
  • Suggested articles


    To submit an update or takedown request for this paper, please submit an Update/Correction/Removal Request.