14,741 research outputs found
Modeling Past and Future for Neural Machine Translation
Existing neural machine translation systems do not explicitly model what has
been translated and what has not during the decoding phase. To address this
problem, we propose a novel mechanism that separates the source information
into two parts: translated Past contents and untranslated Future contents,
which are modeled by two additional recurrent layers. The Past and Future
contents are fed to both the attention model and the decoder states, which
offers NMT systems the knowledge of translated and untranslated contents.
Experimental results show that the proposed approach significantly improves
translation performance in Chinese-English, German-English and English-German
translation tasks. Specifically, the proposed model outperforms the
conventional coverage model in both of the translation quality and the
alignment error rate.Comment: Accepted by Transaction of AC
Confidence through Attention
Attention distributions of the generated translations are a useful bi-product
of attention-based recurrent neural network translation models and can be
treated as soft alignments between the input and output tokens. In this work,
we use attention distributions as a confidence metric for output translations.
We present two strategies of using the attention distributions: filtering out
bad translations from a large back-translated corpus, and selecting the best
translation in a hybrid setup of two different translation systems. While
manual evaluation indicated only a weak correlation between our confidence
score and human judgments, the use-cases showed improvements of up to 2.22 BLEU
points for filtering and 0.99 points for hybrid translation, tested on
EnglishGerman and EnglishLatvian translation
Towards one-shot learning for rare-word translation with external experts
Neural machine translation (NMT) has significantly improved the quality of
automatic translation models. One of the main challenges in current systems is
the translation of rare words. We present a generic approach to address this
weakness by having external models annotate the training data as Experts, and
control the model-expert interaction with a pointer network and reinforcement
learning. Our experiments using phrase-based models to simulate Experts to
complement neural machine translation models show that the model can be trained
to copy the annotations into the output consistently. We demonstrate the
benefit of our proposed framework in outof-domain translation scenarios with
only lexical resources, improving more than 1.0 BLEU point in both translation
directions English to Spanish and German to EnglishComment: 2nd Workshop on Neural Machine Translation and Generation, ACL 201
Neural Text Generation: A Practical Guide
Deep learning methods have recently achieved great empirical success on
machine translation, dialogue response generation, summarization, and other
text generation tasks. At a high level, the technique has been to train
end-to-end neural network models consisting of an encoder model to produce a
hidden representation of the source text, followed by a decoder model to
generate the target. While such models have significantly fewer pieces than
earlier systems, significant tuning is still required to achieve good
performance. For text generation models in particular, the decoder can behave
in undesired ways, such as by generating truncated or repetitive outputs,
outputting bland and generic responses, or in some cases producing
ungrammatical gibberish. This paper is intended as a practical guide for
resolving such undesired behavior in text generation models, with the aim of
helping enable real-world applications
Decoding-History-Based Adaptive Control of Attention for Neural Machine Translation
Attention-based sequence-to-sequence model has proved successful in Neural
Machine Translation (NMT). However, the attention without consideration of
decoding history, which includes the past information in the decoder and the
attention mechanism, often causes much repetition. To address this problem, we
propose the decoding-history-based Adaptive Control of Attention (ACA) for the
NMT model. ACA learns to control the attention by keeping track of the decoding
history and the current information with a memory vector, so that the model can
take the translated contents and the current information into consideration.
Experiments on Chinese-English translation and the English-Vietnamese
translation have demonstrated that our model significantly outperforms the
strong baselines. The analysis shows that our model is capable of generating
translation with less repetition and higher accuracy. The code will be
available at https://github.com/lancopk
Asynchronous Bidirectional Decoding for Neural Machine Translation
The dominant neural machine translation (NMT) models apply unified
attentional encoder-decoder neural networks for translation. Traditionally, the
NMT decoders adopt recurrent neural networks (RNNs) to perform translation in a
left-toright manner, leaving the target-side contexts generated from right to
left unexploited during translation. In this paper, we equip the conventional
attentional encoder-decoder NMT framework with a backward decoder, in order to
explore bidirectional decoding for NMT. Attending to the hidden state sequence
produced by the encoder, our backward decoder first learns to generate the
target-side hidden state sequence from right to left. Then, the forward decoder
performs translation in the forward direction, while in each translation
prediction timestep, it simultaneously applies two attention models to consider
the source-side and reverse target-side hidden states, respectively. With this
new architecture, our model is able to fully exploit source- and target-side
contexts to improve translation quality altogether. Experimental results on
NIST Chinese-English and WMT English-German translation tasks demonstrate that
our model achieves substantial improvements over the conventional NMT by 3.14
and 1.38 BLEU points, respectively. The source code of this work can be
obtained from https://github.com/DeepLearnXMU/ABDNMT.Comment: accepted by AAAI 1
English-Japanese Neural Machine Translation with Encoder-Decoder-Reconstructor
Neural machine translation (NMT) has recently become popular in the field of
machine translation. However, NMT suffers from the problem of repeating or
missing words in the translation. To address this problem, Tu et al. (2017)
proposed an encoder-decoder-reconstructor framework for NMT using
back-translation. In this method, they selected the best forward translation
model in the same manner as Bahdanau et al. (2015), and then trained a
bi-directional translation model as fine-tuning. Their experiments show that it
offers significant improvement in BLEU scores in Chinese-English translation
task. We confirm that our re-implementation also shows the same tendency and
alleviates the problem of repeating and missing words in the translation on a
English-Japanese task too. In addition, we evaluate the effectiveness of
pre-training by comparing it with a jointly-trained model of forward
translation and back-translation.Comment: 8 page
Bootstrapping Techniques for Polysynthetic Morphological Analysis
Polysynthetic languages have exceptionally large and sparse vocabularies,
thanks to the number of morpheme slots and combinations in a word. This
complexity, together with a general scarcity of written data, poses a challenge
to the development of natural language technologies. To address this challenge,
we offer linguistically-informed approaches for bootstrapping a neural
morphological analyzer, and demonstrate its application to Kunwinjku, a
polysynthetic Australian language. We generate data from a finite state
transducer to train an encoder-decoder model. We improve the model by
"hallucinating" missing linguistic structure into the training data, and by
resampling from a Zipf distribution to simulate a more natural distribution of
morphemes. The best model accounts for all instances of reduplication in the
test set and achieves an accuracy of 94.7% overall, a 10 percentage point
improvement over the FST baseline. This process demonstrates the feasibility of
bootstrapping a neural morph analyzer from minimal resources
Bag-of-Words as Target for Neural Machine Translation
A sentence can be translated into more than one correct sentences. However,
most of the existing neural machine translation models only use one of the
correct translations as the targets, and the other correct sentences are
punished as the incorrect sentences in the training stage. Since most of the
correct translations for one sentence share the similar bag-of-words, it is
possible to distinguish the correct translations from the incorrect ones by the
bag-of-words. In this paper, we propose an approach that uses both the
sentences and the bag-of-words as targets in the training stage, in order to
encourage the model to generate the potentially correct sentences that are not
appeared in the training set. We evaluate our model on a Chinese-English
translation dataset, and experiments show our model outperforms the strong
baselines by the BLEU score of 4.55.Comment: accepted by ACL 201
Low Resourced Machine Translation via Morpho-syntactic Modeling: The Case of Dialectal Arabic
We present the second ever evaluated Arabic dialect-to-dialect machine
translation effort, and the first to leverage external resources beyond a small
parallel corpus. The subject has not previously received serious attention due
to lack of naturally occurring parallel data; yet its importance is evidenced
by dialectal Arabic's wide usage and breadth of inter-dialect variation,
comparable to that of Romance languages. Our results suggest that modeling
morphology and syntax significantly improves dialect-to-dialect translation,
though optimizing such data-sparse models requires consideration of the
linguistic differences between dialects and the nature of available data and
resources. On a single-reference blind test set where untranslated input scores
6.5 BLEU and a model trained only on parallel data reaches 14.6, pivot
techniques and morphosyntactic modeling significantly improve performance to
17.5
- …