A Unified Approach to Multi-Pose Audio-Visual ASR

Lucey, Patrick; Potamianos, Gerasimos; Sridharan, Subramanian

research

A Unified Approach to Multi-Pose Audio-Visual ASR

Authors: Patrick Lucey
Gerasimos Potamianos
Subramanian Sridharan
Publication date: 1 January 2007
Publisher: Causal Productions Pty Ltd

Abstract

The vast majority of studies in the field of audio-visual automatic speech recognition (AVASR) assumes frontal images of a speaker's face, but this cannot always be guaranteed in practice. Hence our recent research efforts have concentrated on extracting visual speech information from non-frontal faces, in particular the profile view. The introduction of additional views to an AVASR system increases the complexity of the system, as it has to deal with the different visual features associated with the various views. In this paper, we propose the use of linear regression to find a transformation matrix based on synchronous frontal and profile visual speech data, which is used to normalize the visual speech in each viewpoint into a single uniform view. In our experiments for the task of multi-speaker lipreading, we show that this "pose-invariant" technique reduces train/test mismatch between visual speech features of different views, and is of particular benefit when there is more training data for one viewpoint over another (e.g. frontal over profile)

Similar works

Full text

Open in the Core reader

Download PDF

Available Versions

Queensland University of Technology ePrints Archive

oai:eprints.qut.edu.au:12848

Last time updated on 02/07/2013