research

Visual speech encoding based on facial landmark registration

Abstract

Visual Speech Recognition (VSR) related studies largely ignore the use of state of the art approaches in facial landmark localization, and are also deficit of robust visual features and its temporal encoding. In this work, we propose a visual speech temporal encoding by integrating state of the art fast and accurate facial landmark detection based on ensemble of regression trees learned using gradient boosting. The main contribution of this work is in proposing a fast and simple encoding of visual speech features derived from vertically symmetric point pairs (VeSPP) of facial landmarks corresponding to lip regions, and demonstrating their usefulness in temporal sequence comparisons using Dynamic Time Warping. VSR can be either speaker dependent (SD) or speaker independent (SI), and each of them poses different kind of challenges. In this work, we consider the SD scenario, and obtain 82.65% recognition accuracy on OuluVS database. Unlike recent research in VSR which makes use of auxiliary information such as audio, depth and color channels, our approach does not impose such constraints

    Similar works