4,905 research outputs found
An Unsupervised Method for Automatic Translation Memory Cleaning.
We address the problem of automatically
cleaning a large-scale Translation Memory
(TM) in a fully unsupervised fashion,
i.e. without human-labelled data.
We approach the task by: i) designing
a set of features that capture the similarity
between two text segments in different
languages, ii) use them to induce reliable
training labels for a subset of the
translation units (TUs) contained in the
TM, and iii) use the automatically labelled
data to train an ensemble of binary classifiers.
We apply our method to clean a
test set composed of 1,000 TUs randomly
extracted from the English-Italian version
of MyMemory, the world’s largest public
TM. Our results show competitive performance
not only against a strong baseline
that exploits machine translation, but also
against a state-of-the-art method that relies
on human-labelled data
Multi-Task Video Captioning with Video and Entailment Generation
Video captioning, the task of describing the content of a video, has seen
some promising improvements in recent years with sequence-to-sequence models,
but accurately learning the temporal and logical dynamics involved in the task
still remains a challenge, especially given the lack of sufficient annotated
data. We improve video captioning by sharing knowledge with two related
directed-generation tasks: a temporally-directed unsupervised video prediction
task to learn richer context-aware video encoder representations, and a
logically-directed language entailment generation task to learn better
video-entailed caption decoder representations. For this, we present a
many-to-many multi-task learning model that shares parameters across the
encoders and decoders of the three tasks. We achieve significant improvements
and the new state-of-the-art on several standard video captioning datasets
using diverse automatic and human evaluations. We also show mutual multi-task
improvements on the entailment generation task.Comment: ACL 2017 (14 pages w/ supplementary
Findings of the 2019 Conference on Machine Translation (WMT19)
This paper presents the results of the premier shared task organized alongside the Conference on Machine Translation (WMT) 2019.
Participants were asked to build machine translation systems for any of 18 language pairs, to be evaluated on a test set of news stories. The main metric for this task is human judgment of translation quality. The task was also opened up to additional test suites to probe specific aspects of translation
- …