Search CORE

10,558 research outputs found

Sign language recognition with transformer networks

Author: Dambre Joni
De Coster Mathieu
Van Herreweghe Mieke
Publication venue: European Language Resources Association (ELRA)
Publication date: 01/01/2020
Field of study

Sign languages are complex languages. Research into them is ongoing, supported by large video corpora of which only small parts are annotated. Sign language recognition can be used to speed up the annotation process of these corpora, in order to aid research into sign languages and sign language recognition. Previous research has approached sign language recognition in various ways, using feature extraction techniques or end-to-end deep learning. In this work, we apply a combination of feature extraction using OpenPose for human keypoint estimation and end-to-end feature learning with Convolutional Neural Networks. The proven multi-head attention mechanism used in transformers is applied to recognize isolated signs in the Flemish Sign Language corpus. Our proposed method significantly outperforms the previous state of the art of sign language recognition on the Flemish Sign Language corpus: we obtain an accuracy of 74.7% on a vocabulary of 100 classes. Our results will be implemented as a suggestion system for sign language corpus annotation

Ghent University Academic Bibliography

Towards automatic sign language corpus annotation using deep learning

Author: Dambre Joni
De Coster Mathieu
Van Herreweghe Mieke
Publication venue
Publication date: 01/01/2019
Field of study

Sign classification in sign language corpora is a challenging problem that requires large datasets. Unfortunately, only a small portion of those corpora is labeled. To expedite the annotation process, we propose a gloss suggestion system based on deep learning. We improve upon previous research in three ways. Firstly, we use a proven feature extraction method called OpenPose, rather than learning end-to-end. Secondly, we propose a more suitable and powerful network architecture, based on GRU layers. Finally, we exploit domain and task knowledge to further increase the accuracy. We show that we greatly outperform the previous state of the art on the used dataset. Our method can be used for suggesting a top 5 of annotations given a video fragment that is selected by the corpus annotator. We expect that it will expedite the annotation process to the benefit of sign language translation research

Ghent University Academic Bibliography

Scaling Recurrent Neural Network Language Models

Author: Ash Tom
Mrva David
Prasad Niranjani
Robinson Tony
Williams Will
Publication venue
Publication date: 02/02/2015
Field of study

This paper investigates the scaling properties of Recurrent Neural Network Language Models (RNNLMs). We discuss how to train very large RNNs on GPUs and address the questions of how RNNLMs scale with respect to model size, training-set size, computational costs and memory. Our analysis shows that despite being more costly to train, RNNLMs obtain much lower perplexities on standard benchmarks than n-gram models. We train the largest known RNNs and present relative word error rates gains of 18% on an ASR task. We also present the new lowest perplexities on the recently released billion word language modelling benchmark, 1 BLEU point gain on machine translation and a 17% relative hit rate gain in word prediction

arXiv.org e-Print Archive

Crossref

Tagging the Teleman Corpus

Author: Brants Thorsten
Samuelsson Christer
Publication venue
Publication date: 01/01/1995
Field of study

Experiments were carried out comparing the Swedish Teleman and the English Susanne corpora using an HMM-based and a novel reductionistic statistical part-of-speech tagger. They indicate that tagging the Teleman corpus is the more difficult task, and that the performance of the two different taggers is comparable.Comment: 14 pages, LaTeX, to appear in Proceedings of the 10th Nordic Conference of Computational Linguistics, Helsinki, Finland, 199

arXiv.org e-Print Archive

CiteSeerX

Alignment-guided chunking

Author: Ma Yanjun
Stroppa Nicolas
Way Andy
Publication venue
Publication date: 01/01/2007
Field of study

We introduce an adaptable monolingual chunking approach–Alignment-Guided Chunking (AGC)–which makes use of knowledge of word alignments acquired from bilingual corpora. Our approach is motivated by the observation that a sentence should be chunked differently depending the foreseen end-tasks. For example, given the different requirements of translation into (say) French and German, it is inappropriate to chunk up an English string in exactly the same way as preparation for translation into one or other of these languages. We test our chunking approach on two language pairs: French–English and German–English, where these two bilingual corpora share the same English sentences. Two chunkers trained on French–English (FE-Chunker) and German–English(DE-Chunker ) respectively are used to perform chunking on the same English sentences. We construct two test sets, each suitable for French– English and German–English respectively. The performance of the two chunkers is evaluated on the appropriate test set and with one reference translation only, we report Fscores of 32.63% for the FE-Chunker and 40.41% for the DE-Chunker

Irish Universities

DCU Online Research Access Service

Deep Learning for Environmentally Robust Speech Recognition: An Overview of Recent Developments

Author: Geiger Jürgen
Jin Wenyu
Mousa Amr El-Desoky
Pohjalainen Jouni
Schuller Björn
Zhang Zixing
Publication venue
Publication date: 01/01/2018
Field of study

Eliminating the negative effect of non-stationary environmental noise is a long-standing research topic for automatic speech recognition that stills remains an important challenge. Data-driven supervised approaches, including ones based on deep neural networks, have recently emerged as potential alternatives to traditional unsupervised approaches and with sufficient training, can alleviate the shortcomings of the unsupervised methods in various real-life acoustic environments. In this light, we review recently developed, representative deep learning approaches for tackling non-stationary additive and convolutional degradation of speech with the aim of providing guidelines for those involved in the development of environmentally robust speech recognition systems. We separately discuss single- and multi-channel techniques developed for the front-end and back-end of speech recognition systems, as well as joint front-end and back-end training frameworks

arXiv.org e-Print Archive

OPUS Augsburg

Crossref