7,187 research outputs found
Modeling Target-Side Inflection in Neural Machine Translation
NMT systems have problems with large vocabulary sizes. Byte-pair encoding
(BPE) is a popular approach to solving this problem, but while BPE allows the
system to generate any target-side word, it does not enable effective
generalization over the rich vocabulary in morphologically rich languages with
strong inflectional phenomena. We introduce a simple approach to overcome this
problem by training a system to produce the lemma of a word and its
morphologically rich POS tag, which is then followed by a deterministic
generation step. We apply this strategy for English-Czech and English-German
translation scenarios, obtaining improvements in both settings. We furthermore
show that the improvement is not due to only adding explicit morphological
information.Comment: Accepted as a research paper at WMT17. (Updated version with
corrected references.
Bootstrapping word alignment via word packing
We introduce a simple method to pack words for statistical word alignment. Our goal is to simplify the task of automatic word alignment by packing several consecutive words together when we believe they correspond to a single word in the opposite language. This is done using the word aligner itself, i.e. by bootstrapping on its output. We evaluate the performance of our approach on a Chinese-to-English machine translation task, and report a 12.2% relative increase in BLEU score over a state-of-the art phrase-based SMT system
- …