3,608 research outputs found
Non-Parametric Adaptation for Neural Machine Translation
Neural Networks trained with gradient descent are known to be susceptible to
catastrophic forgetting caused by parameter shift during the training process.
In the context of Neural Machine Translation (NMT) this results in poor
performance on heterogeneous datasets and on sub-tasks like rare phrase
translation. On the other hand, non-parametric approaches are immune to
forgetting, perfectly complementing the generalization ability of NMT. However,
attempts to combine non-parametric or retrieval based approaches with NMT have
only been successful on narrow domains, possibly due to over-reliance on
sentence level retrieval. We propose a novel n-gram level retrieval approach
that relies on local phrase level similarities, allowing us to retrieve
neighbors that are useful for translation even when overall sentence similarity
is low. We complement this with an expressive neural network, allowing our
model to extract information from the noisy retrieved context. We evaluate our
semi-parametric NMT approach on a heterogeneous dataset composed of WMT, IWSLT,
JRC-Acquis and OpenSubtitles, and demonstrate gains on all 4 evaluation sets.
The semi-parametric nature of our approach opens the door for non-parametric
domain adaptation, demonstrating strong inference-time adaptation performance
on new domains without the need for any parameter updates.Comment: Accepted at NAACL 201
Text Generation with Exemplar-based Adaptive Decoding
We propose a novel conditioned text generation model. It draws inspiration
from traditional template-based text generation techniques, where the source
provides the content (i.e., what to say), and the template influences how to
say it. Building on the successful encoder-decoder paradigm, it first encodes
the content representation from the given input text; to produce the output, it
retrieves exemplar text from the training data as "soft templates," which are
then used to construct an exemplar-specific decoder. We evaluate the proposed
model on abstractive text summarization and data-to-text generation. Empirical
results show that this model achieves strong performance and outperforms
comparable baselines.Comment: NAACL 201
Connecting the Dots Between MLE and RL for Sequence Prediction
Sequence prediction models can be learned from example sequences with a
variety of training algorithms. Maximum likelihood learning is simple and
efficient, yet can suffer from compounding error at test time. Reinforcement
learning such as policy gradient addresses the issue but can have prohibitively
poor exploration efficiency. A rich set of other algorithms such as RAML, SPG,
and data noising, have also been developed from different perspectives. This
paper establishes a formal connection between these algorithms. We present a
generalized entropy regularized policy optimization formulation, and show that
the apparently distinct algorithms can all be reformulated as special instances
of the framework, with the only difference being the configurations of a reward
function and a couple of hyperparameters. The unified interpretation offers a
systematic view of the varying properties of exploration and learning
efficiency. Besides, inspired from the framework, we present a new algorithm
that dynamically interpolates among the family of algorithms for scheduled
sequence model learning. Experiments on machine translation, text
summarization, and game imitation learning demonstrate the superiority of the
proposed algorithm.Comment: Major revision. The first two authors contributed equall
Extract and Edit: An Alternative to Back-Translation for Unsupervised Neural Machine Translation
The overreliance on large parallel corpora significantly limits the
applicability of machine translation systems to the majority of language pairs.
Back-translation has been dominantly used in previous approaches for
unsupervised neural machine translation, where pseudo sentence pairs are
generated to train the models with a reconstruction loss. However, the pseudo
sentences are usually of low quality as translation errors accumulate during
training. To avoid this fundamental issue, we propose an alternative but more
effective approach, extract-edit, to extract and then edit real sentences from
the target monolingual corpora. Furthermore, we introduce a comparative
translation loss to evaluate the translated target sentences and thus train the
unsupervised translation systems. Experiments show that the proposed approach
consistently outperforms the previous state-of-the-art unsupervised machine
translation systems across two benchmarks (English-French and English-German)
and two low-resource language pairs (English-Romanian and English-Russian) by
more than 2 (up to 3.63) BLEU points.Comment: 11 pages, 3 figures. Accepted to NAACL 201
Learning to Remember Translation History with a Continuous Cache
Existing neural machine translation (NMT) models generally translate
sentences in isolation, missing the opportunity to take advantage of
document-level information. In this work, we propose to augment NMT models with
a very light-weight cache-like memory network, which stores recent hidden
representations as translation history. The probability distribution over
generated words is updated online depending on the translation history
retrieved from the memory, endowing NMT models with the capability to
dynamically adapt over time. Experiments on multiple domains with different
topics and styles show the effectiveness of the proposed approach with
negligible impact on the computational cost.Comment: Accepted by TACL 201
Retrieve and Refine: Improved Sequence Generation Models For Dialogue
Sequence generation models for dialogue are known to have several problems:
they tend to produce short, generic sentences that are uninformative and
unengaging. Retrieval models on the other hand can surface interesting
responses, but are restricted to the given retrieval set leading to erroneous
replies that cannot be tuned to the specific context. In this work we develop a
model that combines the two approaches to avoid both their deficiencies: first
retrieve a response and then refine it -- the final sequence generator treating
the retrieval as additional context. We show on the recent CONVAI2 challenge
task our approach produces responses superior to both standard retrieval and
generation models in human evaluations
Learning to Discriminate Noises for Incorporating External Information in Neural Machine Translation
Previous studies show that incorporating external information could improve
the translation quality of Neural Machine Translation (NMT) systems. However,
there are inevitably noises in the external information, severely reducing the
benefit that the existing methods could receive from the incorporation. To
tackle the problem, this study pays special attention to the discrimination of
the noises during the incorporation. We argue that there exist two kinds of
noise in this external information, i.e. global noise and local noise, which
affect the translations for the whole sentence and for some specific words,
respectively. Accordingly, we propose a general framework that learns to
jointly discriminate both the global and local noises, so that the external
information could be better leveraged. Our model is trained on the dataset
derived from the original parallel corpus without any external labeled data or
annotation. Experimental results in various real-world scenarios, language
pairs, and neural architectures indicate that discriminating noises contributes
to significant improvements in translation quality by being able to better
incorporate the external information, even in very noisy conditions.Comment: 8 page
Enabling Open-World Specification Mining via Unsupervised Learning
Many programming tasks require using both domain-specific code and
well-established patterns (such as routines concerned with file IO). Together,
several small patterns combine to create complex interactions. This compounding
effect, mixed with domain-specific idiosyncrasies, creates a challenging
environment for fully automatic specification inference. Mining specifications
in this environment, without the aid of rule templates, user-directed feedback,
or predefined API surfaces, is a major challenge. We call this challenge
Open-World Specification Mining.
In this paper, we present a framework for mining specifications and usage
patterns in an Open-World setting. We design this framework to be
miner-agnostic and instead focus on disentangling complex and noisy API
interactions. To evaluate our framework, we introduce a benchmark of 71
clusters extracted from five open-source projects. Using this dataset, we show
that interesting clusters can be recovered, in a fully automatic way, by
leveraging unsupervised learning in the form of word embeddings. Once clusters
have been recovered, the challenge of Open-World Specification Mining is
simplified and any trace-based mining technique can be applied. In addition, we
provide a comprehensive evaluation of three word-vector learners to showcase
the value of sub-word information for embeddings learned in the
software-engineering domain
Automatic Video Captioning using Deep Neural Network
Video understanding has become increasingly important as surveillance, social, and informational videos weave themselves into our everyday lives. Video captioning offers a simple way to summarize, index, and search the data. Most video captioning models utilize a video encoder and captioning decoder framework. Hierarchical encoders can abstractly capture clip level temporal features to represent a video, but the clips are at fixed time steps. This thesis research introduces two models: a hierarchical model with steered captioning, and a Multi-stream Hierarchical Boundary model. The steered captioning model is the first attention model to smartly guide an attention model to appropriate locations in a video by using visual attributes. The Multi-stream Hierarchical Boundary model combines a fixed hierarchy recurrent architecture with a soft hierarchy layer by using intrinsic feature boundary cuts within a video to define clips. This thesis also introduces a novel parametric Gaussian attention which removes the restriction of soft attention techniques which require fixed length video streams. By carefully incorporating Gaussian attention in designated layers, the proposed models demonstrate state-of-the-art video captioning results on recent datasets
Learning to Reuse Translations: Guiding Neural Machine Translation with Examples
In this paper, we study the problem of enabling neural machine translation
(NMT) to reuse previous translations from similar examples in target
prediction. Distinguishing reusable translations from noisy segments and
learning to reuse them in NMT are non-trivial. To solve these challenges, we
propose an Example-Guided NMT (EGNMT) framework with two models: (1) a
noise-masked encoder model that masks out noisy words according to word
alignments and encodes the noise-masked sentences with an additional example
encoder and (2) an auxiliary decoder model that predicts reusable words via an
auxiliary decoder sharing parameters with the primary decoder. We define and
implement the two models with the state-of-the-art Transformer. Experiments
show that the noise-masked encoder model allows NMT to learn useful information
from examples with low fuzzy match scores (FMS) while the auxiliary decoder
model is good for high-FMS examples. More experiments on Chinese-English,
English-German and English-Spanish translation demonstrate that the combination
of the two EGNMT models can achieve improvements of up to +9 BLEU points over
the baseline system and +7 BLEU points over a two-encoder Transformer
- …