2 research outputs found
Fully Convolutional Networks for Continuous Sign Language Recognition
Continuous sign language recognition (SLR) is a challenging task that
requires learning on both spatial and temporal dimensions of signing frame
sequences. Most recent work accomplishes this by using CNN and RNN hybrid
networks. However, training these networks is generally non-trivial, and most
of them fail in learning unseen sequence patterns, causing an unsatisfactory
performance for online recognition. In this paper, we propose a fully
convolutional network (FCN) for online SLR to concurrently learn spatial and
temporal features from weakly annotated video sequences with only
sentence-level annotations given. A gloss feature enhancement (GFE) module is
introduced in the proposed network to enforce better sequence alignment
learning. The proposed network is end-to-end trainable without any
pre-training. We conduct experiments on two large scale SLR datasets.
Experiments show that our method for continuous SLR is effective and performs
well in online recognition.Comment: Accepted to ECCV202