2,353 research outputs found
Bi-Decoder Augmented Network for Neural Machine Translation
Neural Machine Translation (NMT) has become a popular technology in recent
years, and the encoder-decoder framework is the mainstream among all the
methods. It's obvious that the quality of the semantic representations from
encoding is very crucial and can significantly affect the performance of the
model. However, existing unidirectional source-to-target architectures may
hardly produce a language-independent representation of the text because they
rely heavily on the specific relations of the given language pairs. To
alleviate this problem, in this paper, we propose a novel Bi-Decoder Augmented
Network (BiDAN) for the neural machine translation task. Besides the original
decoder which generates the target language sequence, we add an auxiliary
decoder to generate back the source language sequence at the training time.
Since each decoder transforms the representations of the input text into its
corresponding language, jointly training with two target ends can make the
shared encoder has the potential to produce a language-independent semantic
space. We conduct extensive experiments on several NMT benchmark datasets and
the results demonstrate the effectiveness of our proposed approach
A Stable and Effective Learning Strategy for Trainable Greedy Decoding
Beam search is a widely used approximate search strategy for neural network
decoders, and it generally outperforms simple greedy decoding on tasks like
machine translation. However, this improvement comes at substantial
computational cost. In this paper, we propose a flexible new method that allows
us to reap nearly the full benefits of beam search with nearly no additional
computational cost. The method revolves around a small neural network actor
that is trained to observe and manipulate the hidden state of a
previously-trained decoder. To train this actor network, we introduce the use
of a pseudo-parallel corpus built using the output of beam search on a base
model, ranked by a target quality metric like BLEU. Our method is inspired by
earlier work on this problem, but requires no reinforcement learning, and can
be trained reliably on a range of models. Experiments on three parallel corpora
and three architectures show that the method yields substantial improvements in
translation quality and speed over each base system.Comment: Accepted by EMNLP 201
Bi-Directional Neural Machine Translation with Synthetic Parallel Data
Despite impressive progress in high-resource settings, Neural Machine
Translation (NMT) still struggles in low-resource and out-of-domain scenarios,
often failing to match the quality of phrase-based translation. We propose a
novel technique that combines back-translation and multilingual NMT to improve
performance in these difficult cases. Our technique trains a single model for
both directions of a language pair, allowing us to back-translate source or
target monolingual data without requiring an auxiliary model. We then continue
training on the augmented parallel data, enabling a cycle of improvement for a
single model that can incorporate any source, target, or parallel data to
improve both translation directions. As a byproduct, these models can reduce
training and deployment costs significantly compared to uni-directional models.
Extensive experiments show that our technique outperforms standard
back-translation in low-resource scenarios, improves quality on cross-domain
tasks, and effectively reduces costs across the board.Comment: Accepted at the 2nd Workshop on Neural Machine Translation and
Generation (WNMT 2018
Exploring the Use of Attention within an Neural Machine Translation Decoder States to Translate Idioms
Idioms pose problems to almost all Machine Translation systems. This type of
language is very frequent in day-to-day language use and cannot be simply
ignored. The recent interest in memory augmented models in the field of
Language Modelling has aided the systems to achieve good results by bridging
long-distance dependencies. In this paper we explore the use of such techniques
into a Neural Machine Translation system to help in translation of idiomatic
language
Learning to Remember Translation History with a Continuous Cache
Existing neural machine translation (NMT) models generally translate
sentences in isolation, missing the opportunity to take advantage of
document-level information. In this work, we propose to augment NMT models with
a very light-weight cache-like memory network, which stores recent hidden
representations as translation history. The probability distribution over
generated words is updated online depending on the translation history
retrieved from the memory, endowing NMT models with the capability to
dynamically adapt over time. Experiments on multiple domains with different
topics and styles show the effectiveness of the proposed approach with
negligible impact on the computational cost.Comment: Accepted by TACL 201
Open Vocabulary Learning for Neural Chinese Pinyin IME
Pinyin-to-character (P2C) conversion is the core component of pinyin-based
Chinese input method engine (IME). However, the conversion is seriously
compromised by the ambiguities of Chinese characters corresponding to pinyin as
well as the predefined fixed vocabularies. To alleviate such inconveniences, we
propose a neural P2C conversion model augmented by an online updated vocabulary
with a sampling mechanism to support open vocabulary learning during IME
working. Our experiments show that the proposed method outperforms commercial
IMEs and state-of-the-art traditional models on standard corpus and true
inputting history dataset in terms of multiple metrics and thus the online
updated vocabulary indeed helps our IME effectively follows user inputting
behavior.Comment: Accepted by ACL 201
Sequence to Logic with Copy and Cache
Generating logical form equivalents of human language is a fresh way to
employ neural architectures where long short-term memory effectively captures
dependencies in both encoder and decoder units.
The logical form of the sequence usually preserves information from the
natural language side in the form of similar tokens, and recently a copying
mechanism has been proposed which increases the probability of outputting
tokens from the source input through decoding.
In this paper we propose a caching mechanism as a more general form of the
copying mechanism which also weighs all the words from the source vocabulary
according to their relation to the current decoding context.
Our results confirm that the proposed method achieves improvements in
sequence/token-level accuracy on sequence to logical form tasks. Further
experiments on cross-domain adversarial attacks show substantial improvements
when using the most influential examples of other domains for training
Learning to Remember Rare Events
Despite recent advances, memory-augmented deep neural networks are still
limited when it comes to life-long and one-shot learning, especially in
remembering rare events. We present a large-scale life-long memory module for
use in deep learning. The module exploits fast nearest-neighbor algorithms for
efficiency and thus scales to large memory sizes. Except for the
nearest-neighbor query, the module is fully differentiable and trained
end-to-end with no extra supervision. It operates in a life-long manner, i.e.,
without the need to reset it during training.
Our memory module can be easily added to any part of a supervised neural
network. To show its versatility we add it to a number of networks, from simple
convolutional ones tested on image classification to deep sequence-to-sequence
and recurrent-convolutional models. In all cases, the enhanced network gains
the ability to remember and do life-long one-shot learning. Our module
remembers training examples shown many thousands of steps in the past and it
can successfully generalize from them. We set new state-of-the-art for one-shot
learning on the Omniglot dataset and demonstrate, for the first time, life-long
one-shot learning in recurrent neural networks on a large-scale machine
translation task.Comment: Conference paper accepted for ICLR'1
Incorporating Relevant Knowledge in Context Modeling and Response Generation
To sustain engaging conversation, it is critical for chatbots to make good
use of relevant knowledge. Equipped with a knowledge base, chatbots are able to
extract conversation-related attributes and entities to facilitate context
modeling and response generation. In this work, we distinguish the uses of
attribute and entity and incorporate them into the encoder-decoder architecture
in different manners. Based on the augmented architecture, our chatbot, namely
Mike, is able to generate responses by referring to proper entities from the
collected knowledge. To validate the proposed approach, we build a movie
conversation corpus on which the proposed approach significantly outperforms
other four knowledge-grounded models
Ancient-Modern Chinese Translation with a Large Training Dataset
Ancient Chinese brings the wisdom and spirit culture of the Chinese nation.
Automatic translation from ancient Chinese to modern Chinese helps to inherit
and carry forward the quintessence of the ancients. However, the lack of
large-scale parallel corpus limits the study of machine translation in
Ancient-Modern Chinese. In this paper, we propose an Ancient-Modern Chinese
clause alignment approach based on the characteristics of these two languages.
This method combines both lexical-based information and statistical-based
information, which achieves 94.2 F1-score on our manual annotation Test set. We
use this method to create a new large-scale Ancient-Modern Chinese parallel
corpus which contains 1.24M bilingual pairs. To our best knowledge, this is the
first large high-quality Ancient-Modern Chinese dataset. Furthermore, we
analyzed and compared the performance of the SMT and various NMT models on this
dataset and provided a strong baseline for this task.Comment: To appear in the ACM Transactions on Asian and Low-Resource Language
Information Processing (TALLIP
- …