2,443 research outputs found
Attention Focusing for Neural Machine Translation by Bridging Source and Target Embeddings
In neural machine translation, a source sequence of words is encoded into a
vector from which a target sequence is generated in the decoding phase.
Differently from statistical machine translation, the associations between
source words and their possible target counterparts are not explicitly stored.
Source and target words are at the two ends of a long information processing
procedure, mediated by hidden states at both the source encoding and the target
decoding phases. This makes it possible that a source word is incorrectly
translated into a target word that is not any of its admissible equivalent
counterparts in the target language.
In this paper, we seek to somewhat shorten the distance between source and
target words in that procedure, and thus strengthen their association, by means
of a method we term bridging source and target word embeddings. We experiment
with three strategies: (1) a source-side bridging model, where source word
embeddings are moved one step closer to the output target sequence; (2) a
target-side bridging model, which explores the more relevant source word
embeddings for the prediction of the target sequence; and (3) a direct bridging
model, which directly connects source and target word embeddings seeking to
minimize errors in the translation of ones by the others.
Experiments and analysis presented in this paper demonstrate that the
proposed bridging models are able to significantly improve quality of both
sentence translation, in general, and alignment and translation of individual
source words with target words, in particular.Comment: 9 pages, 6 figures. Accepted by ACL201
Improving Lexical Choice in Neural Machine Translation
We explore two solutions to the problem of mistranslating rare words in
neural machine translation. First, we argue that the standard output layer,
which computes the inner product of a vector representing the context with all
possible output word embeddings, rewards frequent words disproportionately, and
we propose to fix the norms of both vectors to a constant value. Second, we
integrate a simple lexical module which is jointly trained with the rest of the
model. We evaluate our approaches on eight language pairs with data sizes
ranging from 100k to 8M words, and achieve improvements of up to +4.3 BLEU,
surpassing phrase-based translation in nearly all settings.Comment: Accepted at NAACL HLT 201
Non-Autoregressive Neural Machine Translation with Enhanced Decoder Input
Non-autoregressive translation (NAT) models, which remove the dependence on
previous target tokens from the inputs of the decoder, achieve significantly
inference speedup but at the cost of inferior accuracy compared to
autoregressive translation (AT) models. Previous work shows that the quality of
the inputs of the decoder is important and largely impacts the model accuracy.
In this paper, we propose two methods to enhance the decoder inputs so as to
improve NAT models. The first one directly leverages a phrase table generated
by conventional SMT approaches to translate source tokens to target tokens,
which are then fed into the decoder as inputs. The second one transforms
source-side word embeddings to target-side word embeddings through
sentence-level alignment and word-level adversary learning, and then feeds the
transformed word embeddings into the decoder as inputs. Experimental results
show our method largely outperforms the NAT baseline~\citep{gu2017non} by
BLEU scores on WMT14 English-German task and BLEU scores on WMT16
English-Romanian task.Comment: AAAI 201
- …