12,753 research outputs found
Effective Strategies in Zero-Shot Neural Machine Translation
In this paper, we proposed two strategies which can be applied to a
multilingual neural machine translation system in order to better tackle
zero-shot scenarios despite not having any parallel corpus. The experiments
show that they are effective in terms of both performance and computing
resources, especially in multilingual translation of unbalanced data in real
zero-resourced condition when they alleviate the language bias problem.Comment: submitted to IWSLT1
Pivot-based Transfer Learning for Neural Machine Translation between Non-English Languages
We present effective pre-training strategies for neural machine translation
(NMT) using parallel corpora involving a pivot language, i.e., source-pivot and
pivot-target, leading to a significant improvement in source-target
translation. We propose three methods to increase the relation among source,
pivot, and target languages in the pre-training: 1) step-wise training of a
single model for different language pairs, 2) additional adapter component to
smoothly connect pre-trained encoder and decoder, and 3) cross-lingual encoder
training via autoencoding of the pivot language. Our methods greatly outperform
multilingual models up to +2.6% BLEU in WMT 2019 French-German and German-Czech
tasks. We show that our improvements are valid also in zero-shot/zero-resource
scenarios.Comment: EMNLP 2019 camera-read
A Brief Survey of Multilingual Neural Machine Translation
We present a survey on multilingual neural machine translation (MNMT), which
has gained a lot of traction in the recent years. MNMT has been useful in
improving translation quality as a result of knowledge transfer. MNMT is more
promising and interesting than its statistical machine translation counterpart
because end-to-end modeling and distributed representations open new avenues.
Many approaches have been proposed in order to exploit multilingual parallel
corpora for improving translation quality. However, the lack of a comprehensive
survey makes it difficult to determine which approaches are promising and hence
deserve further exploration. In this paper, we present an in-depth survey of
existing literature on MNMT. We categorize various approaches based on the
resource scenarios as well as underlying modeling principles. We hope this
paper will serve as a starting point for researchers and engineers interested
in MNMT.Comment: We have substantially expanded this paper for a journal submission to
computing surveys [arXiv:2001.01115
Consistency by Agreement in Zero-shot Neural Machine Translation
Generalization and reliability of multilingual translation often highly
depend on the amount of available parallel data for each language pair of
interest. In this paper, we focus on zero-shot generalization---a challenging
setup that tests models on translation directions they have not been optimized
for at training time. To solve the problem, we (i) reformulate multilingual
translation as probabilistic inference, (ii) define the notion of zero-shot
consistency and show why standard training often results in models unsuitable
for zero-shot tasks, and (iii) introduce a consistent agreement-based training
method that encourages the model to produce equivalent translations of parallel
sentences in auxiliary languages. We test our multilingual NMT models on
multiple public zero-shot translation benchmarks (IWSLT17, UN corpus, Europarl)
and show that agreement-based learning often results in 2-3 BLEU zero-shot
improvement over strong baselines without any loss in performance on supervised
translation directions.Comment: NAACL 2019 (14 pages, 5 figures
Cross-Lingual Transfer Learning for Multilingual Task Oriented Dialog
One of the first steps in the utterance interpretation pipeline of many
task-oriented conversational AI systems is to identify user intents and the
corresponding slots. Since data collection for machine learning models for this
task is time-consuming, it is desirable to make use of existing data in a
high-resource language to train models in low-resource languages. However,
development of such models has largely been hindered by the lack of
multilingual training data. In this paper, we present a new data set of 57k
annotated utterances in English (43k), Spanish (8.6k) and Thai (5k) across the
domains weather, alarm, and reminder. We use this data set to evaluate three
different cross-lingual transfer methods: (1) translating the training data,
(2) using cross-lingual pre-trained embeddings, and (3) a novel method of using
a multilingual machine translation encoder as contextual word representations.
We find that given several hundred training examples in the the target
language, the latter two methods outperform translating the training data.
Further, in very low-resource settings, multilingual contextual word
representations give better results than using cross-lingual static embeddings.
We also compare the cross-lingual methods to using monolingual resources in the
form of contextual ELMo representations and find that given just small amounts
of target language data, this method outperforms all cross-lingual methods,
which highlights the need for more sophisticated cross-lingual methods.Comment: 11 pages, to be presented at NAACL 201
The Natural Language Decathlon: Multitask Learning as Question Answering
Deep learning has improved performance on many natural language processing
(NLP) tasks individually. However, general NLP models cannot emerge within a
paradigm that focuses on the particularities of a single metric, dataset, and
task. We introduce the Natural Language Decathlon (decaNLP), a challenge that
spans ten tasks: question answering, machine translation, summarization,
natural language inference, sentiment analysis, semantic role labeling,
zero-shot relation extraction, goal-oriented dialogue, semantic parsing, and
commonsense pronoun resolution. We cast all tasks as question answering over a
context. Furthermore, we present a new Multitask Question Answering Network
(MQAN) jointly learns all tasks in decaNLP without any task-specific modules or
parameters in the multitask setting. MQAN shows improvements in transfer
learning for machine translation and named entity recognition, domain
adaptation for sentiment analysis and natural language inference, and zero-shot
capabilities for text classification. We demonstrate that the MQAN's
multi-pointer-generator decoder is key to this success and performance further
improves with an anti-curriculum training strategy. Though designed for
decaNLP, MQAN also achieves state of the art results on the WikiSQL semantic
parsing task in the single-task setting. We also release code for procuring and
processing data, training and evaluating models, and reproducing all
experiments for decaNLP
Deep Residual Output Layers for Neural Language Generation
Many tasks, including language generation, benefit from learning the
structure of the output space, particularly when the space of output labels is
large and the data is sparse. State-of-the-art neural language models
indirectly capture the output space structure in their classifier weights since
they lack parameter sharing across output labels. Learning shared output label
mappings helps, but existing methods have limited expressivity and are prone to
overfitting. In this paper, we investigate the usefulness of more powerful
shared mappings for output labels, and propose a deep residual output mapping
with dropout between layers to better capture the structure of the output space
and avoid overfitting. Evaluations on three language generation tasks show that
our output label mapping can match or improve state-of-the-art recurrent and
self-attention architectures, and suggest that the classifier does not
necessarily need to be high-rank to better model natural language if it is
better at capturing the structure of the output space.Comment: To appear in ICML 201
Towards Neural Machine Translation with Partially Aligned Corpora
While neural machine translation (NMT) has become the new paradigm, the
parameter optimization requires large-scale parallel data which is scarce in
many domains and language pairs. In this paper, we address a new translation
scenario in which there only exists monolingual corpora and phrase pairs. We
propose a new method towards translation with partially aligned sentence pairs
which are derived from the phrase pairs and monolingual corpora. To make full
use of the partially aligned corpora, we adapt the conventional NMT training
method in two aspects. On one hand, different generation strategies are
designed for aligned and unaligned target words. On the other hand, a different
objective function is designed to model the partially aligned parts. The
experiments demonstrate that our method can achieve a relatively good result in
such a translation scenario, and tiny bitexts can boost translation quality to
a large extent.Comment: 10 pages, 4 figures, Accepted as a long paper by IJCNLP-201
End-to-End Slot Alignment and Recognition for Cross-Lingual NLU
Natural language understanding (NLU) in the context of goal-oriented dialog
systems typically includes intent classification and slot labeling tasks.
Existing methods to expand an NLU system to new languages use machine
translation with slot label projection from source to the translated
utterances, and thus are sensitive to projection errors. In this work, we
propose a novel end-to-end model that learns to align and predict target slot
labels jointly for cross-lingual transfer. We introduce MultiATIS++, a new
multilingual NLU corpus that extends the Multilingual ATIS corpus to nine
languages across four language families, and evaluate our method using the
corpus. Results show that our method outperforms a simple label projection
method using fast-align on most languages, and achieves competitive performance
to the more complex, state-of-the-art projection method with only half of the
training time. We release our MultiATIS++ corpus to the community to continue
future research on cross-lingual NLU.Comment: Accepted at EMNLP 202
Data Augmentation Generative Adversarial Networks
Effective training of neural networks requires much data. In the low-data
regime, parameters are underdetermined, and learnt networks generalise poorly.
Data Augmentation alleviates this by using existing data more effectively.
However standard data augmentation produces only limited plausible alternative
data. Given there is potential to generate a much broader set of augmentations,
we design and train a generative model to do data augmentation. The model,
based on image conditional Generative Adversarial Networks, takes data from a
source domain and learns to take any data item and generalise it to generate
other within-class data items. As this generative process does not depend on
the classes themselves, it can be applied to novel unseen classes of data. We
show that a Data Augmentation Generative Adversarial Network (DAGAN) augments
standard vanilla classifiers well. We also show a DAGAN can enhance few-shot
learning systems such as Matching Networks. We demonstrate these approaches on
Omniglot, on EMNIST having learnt the DAGAN on Omniglot, and VGG-Face data. In
our experiments we can see over 13% increase in accuracy in the low-data regime
experiments in Omniglot (from 69% to 82%), EMNIST (73.9% to 76%) and VGG-Face
(4.5% to 12%); in Matching Networks for Omniglot we observe an increase of 0.5%
(from 96.9% to 97.4%) and an increase of 1.8% in EMNIST (from 59.5% to 61.3%).Comment: 10 page
- …