Visual speech encoding based on facial landmark registration

Krish,, Ram P.; Whelan, Paul F.

research

Visual speech encoding based on facial landmark registration

Authors: Krish,Ram P.
Paul F. Whelan
Publication date: 26 August 2016
Publisher: Irish Pattern Recognition & Classification Society (IPRCS)

Abstract

Visual Speech Recognition (VSR) related studies largely ignore the use of state of the art approaches in facial landmark localization, and are also deficit of robust visual features and its temporal encoding. In this work, we propose a visual speech temporal encoding by integrating state of the art fast and accurate facial landmark detection based on ensemble of regression trees learned using gradient boosting. The main contribution of this work is in proposing a fast and simple encoding of visual speech features derived from vertically symmetric point pairs (VeSPP) of facial landmarks corresponding to lip regions, and demonstrating their usefulness in temporal sequence comparisons using Dynamic Time Warping. VSR can be either speaker dependent (SD) or speaker independent (SI), and each of them poses different kind of challenges. In this work, we consider the SD scenario, and obtain 82.65% recognition accuracy on OuluVS database. Unlike recent research in VSR which makes use of auxiliary information such as audio, depth and color channels, our approach does not impose such constraints

Similar works

Full text

Open in the Core reader

Download PDF

Available Versions

DCU Online Research Access Service

oai:doras.dcu.ie:22091

Last time updated on 21/11/2017

Name not available

oai:doras.dcu.ie:22091

Last time updated on 09/02/2018