7 research outputs found
A Convolutional Encoder Model for Neural Machine Translation
The prevalent approach to neural machine translation relies on bi-directional
LSTMs to encode the source sentence. In this paper we present a faster and
simpler architecture based on a succession of convolutional layers. This allows
to encode the entire source sentence simultaneously compared to recurrent
networks for which computation is constrained by temporal dependencies. On
WMT'16 English-Romanian translation we achieve competitive accuracy to the
state-of-the-art and we outperform several recently published results on the
WMT'15 English-German task. Our models obtain almost the same accuracy as a
very deep LSTM setup on WMT'14 English-French translation. Our convolutional
encoder speeds up CPU decoding by more than two times at the same or higher
accuracy as a strong bi-directional LSTM baseline.Comment: 13 page
Linguistically Motivated Vocabulary Reduction for Neural Machine Translation from Turkish to English
The necessity of using a fixed-size word vocabulary in order to control the
model complexity in state-of-the-art neural machine translation (NMT) systems
is an important bottleneck on performance, especially for morphologically rich
languages. Conventional methods that aim to overcome this problem by using
sub-word or character-level representations solely rely on statistics and
disregard the linguistic properties of words, which leads to interruptions in
the word structure and causes semantic and syntactic losses. In this paper, we
propose a new vocabulary reduction method for NMT, which can reduce the
vocabulary of a given input corpus at any rate while also considering the
morphological properties of the language. Our method is based on unsupervised
morphology learning and can be, in principle, used for pre-processing any
language pair. We also present an alternative word segmentation method based on
supervised morphological analysis, which aids us in measuring the accuracy of
our model. We evaluate our method in Turkish-to-English NMT task where the
input language is morphologically rich and agglutinative. We analyze different
representation methods in terms of translation accuracy as well as the semantic
and syntactic properties of the generated output. Our method obtains a
significant improvement of 2.3 BLEU points over the conventional vocabulary
reduction technique, showing that it can provide better accuracy in open
vocabulary translation of morphologically rich languages.Comment: The 20th Annual Conference of the European Association for Machine
Translation (EAMT), Research Paper, 12 page
Findings of the 2016 Conference on Machine Translation.
This paper presents the results of the
WMT16 shared tasks, which included five
machine translation (MT) tasks (standard
news, IT-domain, biomedical, multimodal,
pronoun), three evaluation tasks (metrics,
tuning, run-time estimation of MT quality),
and an automatic post-editing task
and bilingual document alignment task.
This year, 102 MT systems from 24 institutions
(plus 36 anonymized online systems)
were submitted to the 12 translation
directions in the news translation task. The
IT-domain task received 31 submissions
from 12 institutions in 7 directions and the
Biomedical task received 15 submissions
systems from 5 institutions. Evaluation
was both automatic and manual (relative
ranking and 100-point scale assessments).
The quality estimation task had three subtasks,
with a total of 14 teams, submitting
39 entries. The automatic post-editing task
had a total of 6 teams, submitting 11 entries
Findings of the 2016 Conference on Machine Translation (WMT16)
This paper presents the results of the
WMT16 shared tasks, which included five
machine translation (MT) tasks (standard
news, IT-domain, biomedical, multimodal,
pronoun), three evaluation tasks (metrics,
tuning, run-time estimation of MT quality),
and an automatic post-editing task
and bilingual document alignment task.
This year, 102 MT systems from 24 institutions
(plus 36 anonymized online systems)
were submitted to the 12 translation
directions in the news translation task. The
IT-domain task received 31 submissions
from 12 institutions in 7 directions and the
Biomedical task received 15 submissions
systems from 5 institutions. Evaluation
was both automatic and manual (relative
ranking and 100-point scale assessments)