14,309 research outputs found
Addressing the Rare Word Problem in Neural Machine Translation
Neural Machine Translation (NMT) is a new approach to machine translation
that has shown promising results that are comparable to traditional approaches.
A significant weakness in conventional NMT systems is their inability to
correctly translate very rare words: end-to-end NMTs tend to have relatively
small vocabularies with a single unk symbol that represents every possible
out-of-vocabulary (OOV) word. In this paper, we propose and implement an
effective technique to address this problem. We train an NMT system on data
that is augmented by the output of a word alignment algorithm, allowing the NMT
system to emit, for each OOV word in the target sentence, the position of its
corresponding word in the source sentence. This information is later utilized
in a post-processing step that translates every OOV word using a dictionary.
Our experiments on the WMT14 English to French translation task show that this
method provides a substantial improvement of up to 2.8 BLEU points over an
equivalent NMT system that does not use this technique. With 37.5 BLEU points,
our NMT system is the first to surpass the best result achieved on a WMT14
contest task.Comment: ACL 2015 camera-ready versio
Stronger Baselines for Trustable Results in Neural Machine Translation
Interest in neural machine translation has grown rapidly as its effectiveness
has been demonstrated across language and data scenarios. New research
regularly introduces architectural and algorithmic improvements that lead to
significant gains over "vanilla" NMT implementations. However, these new
techniques are rarely evaluated in the context of previously published
techniques, specifically those that are widely used in state-of-theart
production and shared-task systems. As a result, it is often difficult to
determine whether improvements from research will carry over to systems
deployed for real-world use. In this work, we recommend three specific methods
that are relatively easy to implement and result in much stronger experimental
systems. Beyond reporting significantly higher BLEU scores, we conduct an
in-depth analysis of where improvements originate and what inherent weaknesses
of basic NMT models are being addressed. We then compare the relative gains
afforded by several other techniques proposed in the literature when starting
with vanilla systems versus our stronger baselines, showing that experimental
conclusions may change depending on the baseline chosen. This indicates that
choosing a strong baseline is crucial for reporting reliable experimental
results.Comment: To appear at the Workshop on Neural Machine Translation (WNMT
Neural System Combination for Machine Translation
Neural machine translation (NMT) becomes a new approach to machine
translation and generates much more fluent results compared to statistical
machine translation (SMT).
However, SMT is usually better than NMT in translation adequacy. It is
therefore a promising direction to combine the advantages of both NMT and SMT.
In this paper, we propose a neural system combination framework leveraging
multi-source NMT, which takes as input the outputs of NMT and SMT systems and
produces the final translation.
Extensive experiments on the Chinese-to-English translation task show that
our model archives significant improvement by 5.3 BLEU points over the best
single system output and 3.4 BLEU points over the state-of-the-art traditional
system combination methods.Comment: Accepted as a short paper by ACL-201
Improving Lexical Choice in Neural Machine Translation
We explore two solutions to the problem of mistranslating rare words in
neural machine translation. First, we argue that the standard output layer,
which computes the inner product of a vector representing the context with all
possible output word embeddings, rewards frequent words disproportionately, and
we propose to fix the norms of both vectors to a constant value. Second, we
integrate a simple lexical module which is jointly trained with the rest of the
model. We evaluate our approaches on eight language pairs with data sizes
ranging from 100k to 8M words, and achieve improvements of up to +4.3 BLEU,
surpassing phrase-based translation in nearly all settings.Comment: Accepted at NAACL HLT 201
- …