8,679 research outputs found
Fully Convolutional Networks for Continuous Sign Language Recognition
Continuous sign language recognition (SLR) is a challenging task that
requires learning on both spatial and temporal dimensions of signing frame
sequences. Most recent work accomplishes this by using CNN and RNN hybrid
networks. However, training these networks is generally non-trivial, and most
of them fail in learning unseen sequence patterns, causing an unsatisfactory
performance for online recognition. In this paper, we propose a fully
convolutional network (FCN) for online SLR to concurrently learn spatial and
temporal features from weakly annotated video sequences with only
sentence-level annotations given. A gloss feature enhancement (GFE) module is
introduced in the proposed network to enforce better sequence alignment
learning. The proposed network is end-to-end trainable without any
pre-training. We conduct experiments on two large scale SLR datasets.
Experiments show that our method for continuous SLR is effective and performs
well in online recognition.Comment: Accepted to ECCV202
Sign language recognition with transformer networks
Sign languages are complex languages. Research into them is ongoing, supported by large video corpora of which only small parts are annotated. Sign language recognition can be used to speed up the annotation process of these corpora, in order to aid research into sign languages and sign language recognition. Previous research has approached sign language recognition in various ways, using feature extraction techniques or end-to-end deep learning. In this work, we apply a combination of feature extraction using OpenPose for human keypoint estimation and end-to-end feature learning with Convolutional Neural Networks. The proven multi-head attention mechanism used in transformers is applied to recognize isolated signs in the Flemish Sign Language corpus. Our proposed method significantly outperforms the previous state of the art of sign language recognition on the Flemish Sign Language corpus: we obtain an accuracy of 74.7% on a vocabulary of 100 classes. Our results will be implemented as a suggestion system for sign language corpus annotation
Advanced Capsule Networks via Context Awareness
Capsule Networks (CN) offer new architectures for Deep Learning (DL)
community. Though its effectiveness has been demonstrated in MNIST and
smallNORB datasets, the networks still face challenges in other datasets for
images with distinct contexts. In this research, we improve the design of CN
(Vector version) namely we expand more Pooling layers to filter image
backgrounds and increase Reconstruction layers to make better image
restoration. Additionally, we perform experiments to compare accuracy and speed
of CN versus DL models. In DL models, we utilize Inception V3 and DenseNet V201
for powerful computers besides NASNet, MobileNet V1 and MobileNet V2 for small
and embedded devices. We evaluate our models on a fingerspelling alphabet
dataset from American Sign Language (ASL). The results show that CNs perform
comparably to DL models while dramatically reducing training time. We also make
a demonstration and give a link for the purpose of illustration.Comment: 12 page
- …