11 research outputs found
Asynchronous Bidirectional Decoding for Neural Machine Translation
The dominant neural machine translation (NMT) models apply unified
attentional encoder-decoder neural networks for translation. Traditionally, the
NMT decoders adopt recurrent neural networks (RNNs) to perform translation in a
left-toright manner, leaving the target-side contexts generated from right to
left unexploited during translation. In this paper, we equip the conventional
attentional encoder-decoder NMT framework with a backward decoder, in order to
explore bidirectional decoding for NMT. Attending to the hidden state sequence
produced by the encoder, our backward decoder first learns to generate the
target-side hidden state sequence from right to left. Then, the forward decoder
performs translation in the forward direction, while in each translation
prediction timestep, it simultaneously applies two attention models to consider
the source-side and reverse target-side hidden states, respectively. With this
new architecture, our model is able to fully exploit source- and target-side
contexts to improve translation quality altogether. Experimental results on
NIST Chinese-English and WMT English-German translation tasks demonstrate that
our model achieves substantial improvements over the conventional NMT by 3.14
and 1.38 BLEU points, respectively. The source code of this work can be
obtained from https://github.com/DeepLearnXMU/ABDNMT.Comment: accepted by AAAI 1
Twin Networks: Matching the Future for Sequence Generation
We propose a simple technique for encouraging generative RNNs to plan ahead.
We train a "backward" recurrent network to generate a given sequence in reverse
order, and we encourage states of the forward model to predict cotemporal
states of the backward model. The backward network is used only during
training, and plays no role during sampling or inference. We hypothesize that
our approach eases modeling of long-term dependencies by implicitly forcing the
forward states to hold information about the longer-term future (as contained
in the backward states). We show empirically that our approach achieves 9%
relative improvement for a speech recognition task, and achieves significant
improvement on a COCO caption generation task.Comment: 12 pages, 3 figures, published at ICLR 201
DTMT: A Novel Deep Transition Architecture for Neural Machine Translation
Past years have witnessed rapid developments in Neural Machine Translation
(NMT). Most recently, with advanced modeling and training techniques, the
RNN-based NMT (RNMT) has shown its potential strength, even compared with the
well-known Transformer (self-attentional) model. Although the RNMT model can
possess very deep architectures through stacking layers, the transition depth
between consecutive hidden states along the sequential axis is still shallow.
In this paper, we further enhance the RNN-based NMT through increasing the
transition depth between consecutive hidden states and build a novel Deep
Transition RNN-based Architecture for Neural Machine Translation, named DTMT.
This model enhances the hidden-to-hidden transition with multiple non-linear
transformations, as well as maintains a linear transformation path throughout
this deep transition by the well-designed linear transformation mechanism to
alleviate the gradient vanishing problem. Experiments show that with the
specially designed deep transition modules, our DTMT can achieve remarkable
improvements on translation quality. Experimental results on Chinese->English
translation task show that DTMT can outperform the Transformer model by +2.09
BLEU points and achieve the best results ever reported in the same dataset. On
WMT14 English->German and English->French translation tasks, DTMT shows
superior quality to the state-of-the-art NMT systems, including the Transformer
and the RNMT+.Comment: Accepted at AAAI 2019. Code is available at:
https://github.com/fandongmeng/DTMT_InDe
Dynamic Context-guided Capsule Network for Multimodal Machine Translation
Multimodal machine translation (MMT), which mainly focuses on enhancing
text-only translation with visual features, has attracted considerable
attention from both computer vision and natural language processing
communities. Most current MMT models resort to attention mechanism, global
context modeling or multimodal joint representation learning to utilize visual
features. However, the attention mechanism lacks sufficient semantic
interactions between modalities while the other two provide fixed visual
context, which is unsuitable for modeling the observed variability when
generating translation. To address the above issues, in this paper, we propose
a novel Dynamic Context-guided Capsule Network (DCCN) for MMT. Specifically, at
each timestep of decoding, we first employ the conventional source-target
attention to produce a timestep-specific source-side context vector. Next, DCCN
takes this vector as input and uses it to guide the iterative extraction of
related visual features via a context-guided dynamic routing mechanism.
Particularly, we represent the input image with global and regional visual
features, we introduce two parallel DCCNs to model multimodal context vectors
with visual features at different granularities. Finally, we obtain two
multimodal context vectors, which are fused and incorporated into the decoder
for the prediction of the target word. Experimental results on the Multi30K
dataset of English-to-German and English-to-French translation demonstrate the
superiority of DCCN. Our code is available on
https://github.com/DeepLearnXMU/MM-DCCN
Grapheme-to-Phoneme Conversion with Convolutional Neural Networks
Grapheme-to-phoneme (G2P) conversion is the process of generating pronunciation for words based on their written form. It has a highly essential role for natural language processing, text-to-speech synthesis and automatic speech recognition systems. In this paper, we investigate convolutional neural networks (CNN) for G2P conversion. We propose a novel CNN-based sequence-to-sequence (seq2seq) architecture for G2P conversion. Our approach includes an end-to-end CNN G2P conversion with residual connections, furthermore, a model, which utilizes a convolutional neural network (with and without residual connections) as encoder and Bi-LSTM as a decoder. We compare our approach with state-of-the-art methods, including Encoder-Decoder LSTM and Encoder-Decoder Bi-LSTM. Training and inference times, phoneme and word error rates were evaluated on the public CMUDict dataset for US English, and the best performing convolutional neural network based architecture was also evaluated on the NetTalk dataset. Our method approaches the accuracy of previous state-of-the-art results in terms of phoneme error rate