13,623 research outputs found
Multi-Source Syntactic Neural Machine Translation
We introduce a novel multi-source technique for incorporating source syntax
into neural machine translation using linearized parses. This is achieved by
employing separate encoders for the sequential and parsed versions of the same
source sentence; the resulting representations are then combined using a
hierarchical attention mechanism. The proposed model improves over both seq2seq
and parsed baselines by over 1 BLEU on the WMT17 English-German task. Further
analysis shows that our multi-source syntactic model is able to translate
successfully without any parsed input, unlike standard parsed methods. In
addition, performance does not deteriorate as much on long sentences as for the
baselines.Comment: EMNLP 201
Moving beyond parallel data for neural machine translation
The goal of neural machine translation (NMT) is to build an end-to-end system that
automatically translates sentences from the source language to the target language.
Neural machine translation has become the dominant paradigm in machine translation
in recent years, showing strong improvements over prior statistical methods in many
scenarios. However, neural machine translation relies heavily on parallel corpora for
training; even for two languages with abundant monolingual resources (or with a large
number of speakers), such parallel corpora may be scarce. Thus, it is important to
develop methods for leveraging additional types of data in NMT training. This thesis
explores ways of augmenting the parallel training data of neural machine translation
with non-parallel sources of data. We concentrate on two main types of additional
data: monolingual corpora and structural annotations. First, we propose a method for
adding target-language monolingual data into neural machine translation in which the
monolingual data is converted to parallel data through copying. Thus, the NMT system
is trained on two tasks: translation from source language to target language, and
autoencoding the target language. We show that this model achieves improvements in
BLEU score for low- and medium-resource setups. Second, we consider the task of
zero-resource NMT, where no source ↔ target parallel training data is available, but
parallel data with a pivot language is abundant. We improve these models by adding a
monolingual corpus in the pivot language, translating this corpus into both the source
and the target language to create a pseudo-parallel source-target corpus. In the second
half of this thesis, we turn our attention to syntax, introducing methods for adding
syntactic annotation of the source language into neural machine translation. In particular,
our multi-source model, which leverages an additional encoder to inject syntax
into the NMT model, results in strong improvements over non-syntactic NMT for a
high-resource translation case, while remaining robust to unparsed inputs. We also
introduce a multi-task model that augments the transformer architecture with syntax;
this model improves translation across several language pairs. Finally, we consider
the case where no syntactic annotations are available (such as when translating from
very low-resource languages). We introduce an unsupervised hierarchical encoder that
induces a tree structure over the source sentences based solely on the downstream task
of translation. Although the resulting hierarchies do not resemble traditional syntax,
the model shows large improvements in BLEU for low-resource NMT
Translating Phrases in Neural Machine Translation
Phrases play an important role in natural language understanding and machine
translation (Sag et al., 2002; Villavicencio et al., 2005). However, it is
difficult to integrate them into current neural machine translation (NMT) which
reads and generates sentences word by word. In this work, we propose a method
to translate phrases in NMT by integrating a phrase memory storing target
phrases from a phrase-based statistical machine translation (SMT) system into
the encoder-decoder architecture of NMT. At each decoding step, the phrase
memory is first re-written by the SMT model, which dynamically generates
relevant target phrases with contextual information provided by the NMT model.
Then the proposed model reads the phrase memory to make probability estimations
for all phrases in the phrase memory. If phrase generation is carried on, the
NMT decoder selects an appropriate phrase from the memory to perform phrase
translation and updates its decoding state by consuming the words in the
selected phrase. Otherwise, the NMT decoder generates a word from the
vocabulary as the general NMT decoder does. Experiment results on the Chinese
to English translation show that the proposed model achieves significant
improvements over the baseline on various test sets.Comment: Accepted by EMNLP 201
Towards String-to-Tree Neural Machine Translation
We present a simple method to incorporate syntactic information about the
target language in a neural machine translation system by translating into
linearized, lexicalized constituency trees. An experiment on the WMT16
German-English news translation task resulted in an improved BLEU score when
compared to a syntax-agnostic NMT baseline trained on the same dataset. An
analysis of the translations from the syntax-aware system shows that it
performs more reordering during translation in comparison to the baseline. A
small-scale human evaluation also showed an advantage to the syntax-aware
system.Comment: Accepted as a short paper in ACL 201
Chunk-Based Bi-Scale Decoder for Neural Machine Translation
In typical neural machine translation~(NMT), the decoder generates a sentence
word by word, packing all linguistic granularities in the same time-scale of
RNN. In this paper, we propose a new type of decoder for NMT, which splits the
decode state into two parts and updates them in two different time-scales.
Specifically, we first predict a chunk time-scale state for phrasal modeling,
on top of which multiple word time-scale states are generated. In this way, the
target sentence is translated hierarchically from chunks to words, with
information in different granularities being leveraged. Experiments show that
our proposed model significantly improves the translation performance over the
state-of-the-art NMT model.Comment: Accepted as a short paper by ACL 201
- …