1,441 research outputs found
Considerations for meaningful sign language machine translation based on glosses
Automatic sign language processing is gaining popularity in Natural Language
Processing (NLP) research (Yin et al., 2021). In machine translation (MT) in
particular, sign language translation based on glosses is a prominent approach.
In this paper, we review recent works on neural gloss translation. We find that
limitations of glosses in general and limitations of specific datasets are not
discussed in a transparent manner and that there is no common standard for
evaluation.
To address these issues, we put forward concrete recommendations for future
research on gloss translation. Our suggestions advocate awareness of the
inherent limitations of gloss-based approaches, realistic datasets, stronger
baselines and convincing evaluation
Considerations for meaningful sign language machine translation based on glosses
Automatic sign language processing is gaining popularity in Natural Language Processing (NLP) research (Yin et al., 2021). In machine translation (MT) in particular, sign language translation based on glosses is a prominent approach. In this paper, we review recent works on neural gloss translation. We find that limitations of glosses in general and limitations of specific datasets are not discussed in a transparent manner and that there is no common standard for evaluation. To address these issues, we put forward concrete recommendations for future research on gloss translation. Our suggestions advocate awareness of the inherent limitations of gloss-based approaches, realistic datasets, stronger baselines and convincing evaluation
Gloss Attention for Gloss-free Sign Language Translation
Most sign language translation (SLT) methods to date require the use of gloss
annotations to provide additional supervision information, however, the
acquisition of gloss is not easy. To solve this problem, we first perform an
analysis of existing models to confirm how gloss annotations make SLT easier.
We find that it can provide two aspects of information for the model, 1) it can
help the model implicitly learn the location of semantic boundaries in
continuous sign language videos, 2) it can help the model understand the sign
language video globally. We then propose \emph{gloss attention}, which enables
the model to keep its attention within video segments that have the same
semantics locally, just as gloss helps existing models do. Furthermore, we
transfer the knowledge of sentence-to-sentence similarity from the natural
language model to our gloss attention SLT network (GASLT) to help it understand
sign language videos at the sentence level. Experimental results on multiple
large-scale sign language datasets show that our proposed GASLT model
significantly outperforms existing methods. Our code is provided in
\url{https://github.com/YinAoXiong/GASLT}
- …