Location of Repository

Can audio-visual speech recognition outperform acoustically enhanced speech recognition in automotive environment?

By Rajitha Navarathna, Tristan Kleinschmidt, David B. Dean, Sridha Sridharan and Patrick J. Lucey

Abstract

The use of visual features in the form of lip movements to improve the performance of acoustic speech recognition has been shown to work well, particularly in noisy acoustic conditions. However, whether this technique can outperform speech recognition incorporating well-known acoustic enhancement techniques, such as spectral subtraction, or multi-channel beamforming is not known. This is an important question to be answered especially in an automotive environment, for the design of an efficient human-vehicle computer interface. We perform a variety of speech recognition experiments on a challenging automotive speech dataset and results show that synchronous HMM-based audio-visual fusion can outperform traditional single as well as multi-channel acoustic speech enhancement techniques. We also show that further improvement in recognition performance can be obtained by fusing speech-enhanced audio with the visual modality, demonstrating the complementary nature of the two robust speech recognition approaches

Topics: 090609 Signal Processing, Speech enhancement, robust speech recognition, audio-visual automatic speech recognition, synchronous HMM
Year: 2011
OAI identifier: oai:eprints.qut.edu.au:45770

Suggested articles

Preview


To submit an update or takedown request for this paper, please submit an Update/Correction/Removal Request.