11,288 research outputs found
Neural Machine Translation by Jointly Learning to Align and Translate
Neural machine translation is a recently proposed approach to machine
translation. Unlike the traditional statistical machine translation, the neural
machine translation aims at building a single neural network that can be
jointly tuned to maximize the translation performance. The models proposed
recently for neural machine translation often belong to a family of
encoder-decoders and consists of an encoder that encodes a source sentence into
a fixed-length vector from which a decoder generates a translation. In this
paper, we conjecture that the use of a fixed-length vector is a bottleneck in
improving the performance of this basic encoder-decoder architecture, and
propose to extend this by allowing a model to automatically (soft-)search for
parts of a source sentence that are relevant to predicting a target word,
without having to form these parts as a hard segment explicitly. With this new
approach, we achieve a translation performance comparable to the existing
state-of-the-art phrase-based system on the task of English-to-French
translation. Furthermore, qualitative analysis reveals that the
(soft-)alignments found by the model agree well with our intuition.Comment: Accepted at ICLR 2015 as oral presentatio
Enhanced Neural Machine Translation by Learning from Draft
Neural machine translation (NMT) has recently achieved impressive results. A
potential problem of the existing NMT algorithm, however, is that the decoding
is conducted from left to right, without considering the right context. This
paper proposes an two-stage approach to solve the problem. In the first stage,
a conventional attention-based NMT system is used to produce a draft
translation, and in the second stage, a novel double-attention NMT system is
used to refine the translation, by looking at the original input as well as the
draft translation. This drafting-and-refinement can obtain the right-context
information from the draft, hence producing more consistent translations. We
evaluated this approach using two Chinese-English translation tasks, one with
44k pairs and 1M pairs respectively. The experiments showed that our approach
achieved positive improvements over the conventional NMT system: the
improvements are 2.4 and 0.9 BLEU points on the small-scale and large-scale
tasks, respectively
English-Japanese Neural Machine Translation with Encoder-Decoder-Reconstructor
Neural machine translation (NMT) has recently become popular in the field of
machine translation. However, NMT suffers from the problem of repeating or
missing words in the translation. To address this problem, Tu et al. (2017)
proposed an encoder-decoder-reconstructor framework for NMT using
back-translation. In this method, they selected the best forward translation
model in the same manner as Bahdanau et al. (2015), and then trained a
bi-directional translation model as fine-tuning. Their experiments show that it
offers significant improvement in BLEU scores in Chinese-English translation
task. We confirm that our re-implementation also shows the same tendency and
alleviates the problem of repeating and missing words in the translation on a
English-Japanese task too. In addition, we evaluate the effectiveness of
pre-training by comparing it with a jointly-trained model of forward
translation and back-translation.Comment: 8 page
Multi-Way, Multilingual Neural Machine Translation with a Shared Attention Mechanism
We propose multi-way, multilingual neural machine translation. The proposed
approach enables a single neural translation model to translate between
multiple languages, with a number of parameters that grows only linearly with
the number of languages. This is made possible by having a single attention
mechanism that is shared across all language pairs. We train the proposed
multi-way, multilingual model on ten language pairs from WMT'15 simultaneously
and observe clear performance improvements over models trained on only one
language pair. In particular, we observe that the proposed model significantly
improves the translation quality of low-resource language pairs
Asynchronous Bidirectional Decoding for Neural Machine Translation
The dominant neural machine translation (NMT) models apply unified
attentional encoder-decoder neural networks for translation. Traditionally, the
NMT decoders adopt recurrent neural networks (RNNs) to perform translation in a
left-toright manner, leaving the target-side contexts generated from right to
left unexploited during translation. In this paper, we equip the conventional
attentional encoder-decoder NMT framework with a backward decoder, in order to
explore bidirectional decoding for NMT. Attending to the hidden state sequence
produced by the encoder, our backward decoder first learns to generate the
target-side hidden state sequence from right to left. Then, the forward decoder
performs translation in the forward direction, while in each translation
prediction timestep, it simultaneously applies two attention models to consider
the source-side and reverse target-side hidden states, respectively. With this
new architecture, our model is able to fully exploit source- and target-side
contexts to improve translation quality altogether. Experimental results on
NIST Chinese-English and WMT English-German translation tasks demonstrate that
our model achieves substantial improvements over the conventional NMT by 3.14
and 1.38 BLEU points, respectively. The source code of this work can be
obtained from https://github.com/DeepLearnXMU/ABDNMT.Comment: accepted by AAAI 1
Local Monotonic Attention Mechanism for End-to-End Speech and Language Processing
Recently, encoder-decoder neural networks have shown impressive performance
on many sequence-related tasks. The architecture commonly uses an attentional
mechanism which allows the model to learn alignments between the source and the
target sequence. Most attentional mechanisms used today is based on a global
attention property which requires a computation of a weighted summarization of
the whole input sequence generated by encoder states. However, it is
computationally expensive and often produces misalignment on the longer input
sequence. Furthermore, it does not fit with monotonous or left-to-right nature
in several tasks, such as automatic speech recognition (ASR),
grapheme-to-phoneme (G2P), etc. In this paper, we propose a novel attention
mechanism that has local and monotonic properties. Various ways to control
those properties are also explored. Experimental results on ASR, G2P and
machine translation between two languages with similar sentence structures,
demonstrate that the proposed encoder-decoder model with local monotonic
attention could achieve significant performance improvements and reduce the
computational complexity in comparison with the one that used the standard
global attention architecture.Comment: Accepted at IJCNLP 2017 --- (V2: added more experiments on G2P & MT
Neural Contextual Conversation Learning with Labeled Question-Answering Pairs
Neural conversational models tend to produce generic or safe responses in
different contexts, e.g., reply \textit{"Of course"} to narrative statements or
\textit{"I don't know"} to questions. In this paper, we propose an end-to-end
approach to avoid such problem in neural generative models. Additional memory
mechanisms have been introduced to standard sequence-to-sequence (seq2seq)
models, so that context can be considered while generating sentences. Three
seq2seq models, which memorize a fix-sized contextual vector from hidden input,
hidden input/output and a gated contextual attention structure respectively,
have been trained and tested on a dataset of labeled question-answering pairs
in Chinese. The model with contextual attention outperforms others including
the state-of-the-art seq2seq models on perplexity test. The novel contextual
model generates diverse and robust responses, and is able to carry out
conversations on a wide range of topics appropriately
Video Description using Bidirectional Recurrent Neural Networks
Although traditionally used in the machine translation field, the
encoder-decoder framework has been recently applied for the generation of video
and image descriptions. The combination of Convolutional and Recurrent Neural
Networks in these models has proven to outperform the previous state of the
art, obtaining more accurate video descriptions. In this work we propose
pushing further this model by introducing two contributions into the encoding
stage. First, producing richer image representations by combining object and
location information from Convolutional Neural Networks and second, introducing
Bidirectional Recurrent Neural Networks for capturing both forward and backward
temporal relationships in the input frames.Comment: 8 pages, 3 figures, 1 table, Submitted to International Conference on
Artificial Neural Networks (ICANN
Combining Advanced Methods in Japanese-Vietnamese Neural Machine Translation
Neural machine translation (NMT) systems have recently obtained state-of-the
art in many machine translation systems between popular language pairs because
of the availability of data. For low-resourced language pairs, there are few
researches in this field due to the lack of bilingual data. In this paper, we
attempt to build the first NMT systems for a low-resourced language
pairs:Japanese-Vietnamese. We have also shown significant improvements when
combining advanced methods to reduce the adverse impacts of data sparsity and
improve the quality of NMT systems. In addition, we proposed a variant of
Byte-Pair Encoding algorithm to perform effective word segmentation for
Vietnamese texts and alleviate the rare-word problem that persists in NMT
systems
Unsupervised Neural Machine Translation
In spite of the recent success of neural machine translation (NMT) in
standard benchmarks, the lack of large parallel corpora poses a major practical
problem for many language pairs. There have been several proposals to alleviate
this issue with, for instance, triangulation and semi-supervised learning
techniques, but they still require a strong cross-lingual signal. In this work,
we completely remove the need of parallel data and propose a novel method to
train an NMT system in a completely unsupervised manner, relying on nothing but
monolingual corpora. Our model builds upon the recent work on unsupervised
embedding mappings, and consists of a slightly modified attentional
encoder-decoder model that can be trained on monolingual corpora alone using a
combination of denoising and backtranslation. Despite the simplicity of the
approach, our system obtains 15.56 and 10.21 BLEU points in WMT 2014
French-to-English and German-to-English translation. The model can also profit
from small parallel corpora, and attains 21.81 and 15.24 points when combined
with 100,000 parallel sentences, respectively. Our implementation is released
as an open source project.Comment: Published as a conference paper at ICLR 201
- …