10,558 research outputs found
Sign language recognition with transformer networks
Sign languages are complex languages. Research into them is ongoing, supported by large video corpora of which only small parts are annotated. Sign language recognition can be used to speed up the annotation process of these corpora, in order to aid research into sign languages and sign language recognition. Previous research has approached sign language recognition in various ways, using feature extraction techniques or end-to-end deep learning. In this work, we apply a combination of feature extraction using OpenPose for human keypoint estimation and end-to-end feature learning with Convolutional Neural Networks. The proven multi-head attention mechanism used in transformers is applied to recognize isolated signs in the Flemish Sign Language corpus. Our proposed method significantly outperforms the previous state of the art of sign language recognition on the Flemish Sign Language corpus: we obtain an accuracy of 74.7% on a vocabulary of 100 classes. Our results will be implemented as a suggestion system for sign language corpus annotation
Towards automatic sign language corpus annotation using deep learning
Sign classification in sign language corpora is a challenging problem that requires large datasets. Unfortunately, only a small portion of those corpora is labeled. To expedite the annotation process, we propose a gloss suggestion system based on deep learning. We improve upon previous research in three ways. Firstly, we use a proven feature extraction method called OpenPose, rather than learning end-to-end. Secondly, we propose a more suitable and powerful network architecture, based on GRU layers. Finally, we exploit domain and task knowledge to further increase the accuracy.
We show that we greatly outperform the previous state of the art on the used dataset. Our method can be used for suggesting a top 5 of annotations given a video fragment that is selected by the corpus annotator. We expect that it will expedite the annotation process to the benefit of sign language translation research
Scaling Recurrent Neural Network Language Models
This paper investigates the scaling properties of Recurrent Neural Network
Language Models (RNNLMs). We discuss how to train very large RNNs on GPUs and
address the questions of how RNNLMs scale with respect to model size,
training-set size, computational costs and memory. Our analysis shows that
despite being more costly to train, RNNLMs obtain much lower perplexities on
standard benchmarks than n-gram models. We train the largest known RNNs and
present relative word error rates gains of 18% on an ASR task. We also present
the new lowest perplexities on the recently released billion word language
modelling benchmark, 1 BLEU point gain on machine translation and a 17%
relative hit rate gain in word prediction
Tagging the Teleman Corpus
Experiments were carried out comparing the Swedish Teleman and the English
Susanne corpora using an HMM-based and a novel reductionistic statistical
part-of-speech tagger. They indicate that tagging the Teleman corpus is the
more difficult task, and that the performance of the two different taggers is
comparable.Comment: 14 pages, LaTeX, to appear in Proceedings of the 10th Nordic
Conference of Computational Linguistics, Helsinki, Finland, 199
Alignment-guided chunking
We introduce an adaptable monolingual chunking approach–Alignment-Guided Chunking (AGC)–which makes use of knowledge of word alignments acquired from bilingual
corpora. Our approach is motivated by the observation that a sentence should be chunked differently depending
the foreseen end-tasks. For example, given the different
requirements of translation into (say) French and German, it is inappropriate to chunk up an English string in exactly the same way as preparation for translation into one
or other of these languages. We test our chunking approach
on two language pairs: French–English and German–English, where these two bilingual corpora share the same English sentences. Two chunkers trained on French–English
(FE-Chunker) and German–English(DE-Chunker ) respectively are used to perform chunking on the same English sentences. We construct two test sets, each suitable for French–
English and German–English respectively. The performance of the two chunkers is evaluated on the appropriate test set and with one reference translation only, we report Fscores
of 32.63% for the FE-Chunker and 40.41% for the DE-Chunker
Deep Learning for Environmentally Robust Speech Recognition: An Overview of Recent Developments
Eliminating the negative effect of non-stationary environmental noise is a
long-standing research topic for automatic speech recognition that stills
remains an important challenge. Data-driven supervised approaches, including
ones based on deep neural networks, have recently emerged as potential
alternatives to traditional unsupervised approaches and with sufficient
training, can alleviate the shortcomings of the unsupervised methods in various
real-life acoustic environments. In this light, we review recently developed,
representative deep learning approaches for tackling non-stationary additive
and convolutional degradation of speech with the aim of providing guidelines
for those involved in the development of environmentally robust speech
recognition systems. We separately discuss single- and multi-channel techniques
developed for the front-end and back-end of speech recognition systems, as well
as joint front-end and back-end training frameworks
- …