Combining residual networks with LSTMs for lipreading

Stafylakis, Themos; Tzimiropoulos, Georgios

research

Combining residual networks with LSTMs for lipreading

Authors: Themos Stafylakis
Georgios Tzimiropoulos
Publication date
Publisher

Abstract

We propose an end-to-end deep learning architecture for word level visual speech recognition. The system is a combination of spatiotemporal convolutional, residual and bidirectional Long Short-Term Memory networks. We trained and evaluated it on the Lipreading In-The-Wild benchmark, a challenging database of 500-size vocabulary consisting of video excerpts from BBC TV broadcasts. The proposed network attains word accuracy equal to 83.0%, yielding 6.8% absolute improvement over the current state-of-the-art

Similar works

Full text

Open in the Core reader

Download PDF

Available Versions

Nottingham ePrints

oai:eprints.nottingham.ac.uk:4...

Last time updated on 12/08/2017