6,861 research outputs found
Lost in translation: the problems of using mainstream MT evaluation metrics for sign language translation
In this paper we consider the problems of applying corpus-based techniques to minority languages that are neither politically recognised nor have a formally accepted writing system, namely sign languages. We discuss the adoption of an annotated form of sign language data as a suitable corpus for the development of a data-driven machine translation (MT) system, and deal with issues that arise from its use. Useful software tools that facilitate easy annotation of video data are also discussed. Furthermore, we address the problems of using traditional MT evaluation metrics for sign language translation. Based on the candidate translations produced from our example-based machine translation system, we discuss why standard metrics fall short of providing an accurate evaluation and suggest more suitable evaluation methods
An algorithm for cross-lingual sense-clustering tested in a MT evaluation setting
Unsupervised sense induction methods offer a solution to the
problem of scarcity of semantic resources. These methods
automatically extract semantic information from textual data
and create resources adapted to specific applications and domains of interest. In this paper, we present a clustering algorithm for cross-lingual sense induction which generates
bilingual semantic inventories from parallel corpora. We describe the clustering procedure and the obtained resources. We then proceed to a large-scale evaluation by integrating the resources into a Machine Translation (MT) metric (METEOR). We show that the use of the data-driven sense-cluster inventories leads to better correlation with human judgments of translation quality, compared to precision-based metrics, and to improvements similar to those obtained when a handcrafted semantic resource is used
An example-based approach to translating sign language
Users of sign languages are often forced to use a language in which they have reduced competence simply because documentation in their preferred format is not available. While some research exists on translating between natural and sign languages, we present here what we believe to be the first attempt to tackle this problem using an example-based (EBMT) approach.
Having obtained a set of English–Dutch Sign Language examples, we employ an approach to EBMT using the ‘Marker Hypothesis’ (Green, 1979), analogous to the successful system of (Way & Gough, 2003), (Gough & Way, 2004a) and (Gough & Way, 2004b). In a set of experiments, we show that
encouragingly good translation quality may be obtained using such an approach
What value do explicit high level concepts have in vision to language problems?
Much of the recent progress in Vision-to-Language (V2L) problems has been
achieved through a combination of Convolutional Neural Networks (CNNs) and
Recurrent Neural Networks (RNNs). This approach does not explicitly represent
high-level semantic concepts, but rather seeks to progress directly from image
features to text. We propose here a method of incorporating high-level concepts
into the very successful CNN-RNN approach, and show that it achieves a
significant improvement on the state-of-the-art performance in both image
captioning and visual question answering. We also show that the same mechanism
can be used to introduce external semantic information and that doing so
further improves performance. In doing so we provide an analysis of the value
of high level semantic information in V2L problems.Comment: Accepted to IEEE Conf. Computer Vision and Pattern Recognition 2016.
Fixed titl
Building a sign language corpus for use in machine translation
In recent years data-driven methods of machine translation (MT) have overtaken rule-based approaches as the predominant means of automatically translating between languages. A pre-requisite for such an approach is a parallel corpus of the source and target languages. Technological developments in sign language (SL) capturing, analysis and processing tools now mean that SL corpora are
becoming increasingly available. With transcription and language analysis tools being mainly designed and used for linguistic purposes, we describe the process of creating a multimedia parallel corpus specifically for the purposes of English to Irish Sign Language (ISL) MT. As part of our larger project on localisation, our research is focussed on developing assistive technology for patients with limited English in the domain of healthcare. Focussing on the first point of contact a patient has with a GP’s office, the
medical secretary, we sought to develop a corpus from the dialogue between the two parties when scheduling an appointment. Throughout the development process we have created one parallel corpus in six different modalities from this initial dialogue. In this paper we discuss the multi-stage process of the development of this parallel corpus as individual and interdependent entities, both for
our own MT purposes and their usefulness in the wider MT and SL research domains
Integrating Weakly Supervised Word Sense Disambiguation into Neural Machine Translation
This paper demonstrates that word sense disambiguation (WSD) can improve
neural machine translation (NMT) by widening the source context considered when
modeling the senses of potentially ambiguous words. We first introduce three
adaptive clustering algorithms for WSD, based on k-means, Chinese restaurant
processes, and random walks, which are then applied to large word contexts
represented in a low-rank space and evaluated on SemEval shared-task data. We
then learn word vectors jointly with sense vectors defined by our best WSD
method, within a state-of-the-art NMT system. We show that the concatenation of
these vectors, and the use of a sense selection mechanism based on the weighted
average of sense vectors, outperforms several baselines including sense-aware
ones. This is demonstrated by translation on five language pairs. The
improvements are above one BLEU point over strong NMT baselines, +4% accuracy
over all ambiguous nouns and verbs, or +20% when scored manually over several
challenging words.Comment: To appear in TAC
- …