2,945 research outputs found
Bi-Directional Neural Machine Translation with Synthetic Parallel Data
Despite impressive progress in high-resource settings, Neural Machine
Translation (NMT) still struggles in low-resource and out-of-domain scenarios,
often failing to match the quality of phrase-based translation. We propose a
novel technique that combines back-translation and multilingual NMT to improve
performance in these difficult cases. Our technique trains a single model for
both directions of a language pair, allowing us to back-translate source or
target monolingual data without requiring an auxiliary model. We then continue
training on the augmented parallel data, enabling a cycle of improvement for a
single model that can incorporate any source, target, or parallel data to
improve both translation directions. As a byproduct, these models can reduce
training and deployment costs significantly compared to uni-directional models.
Extensive experiments show that our technique outperforms standard
back-translation in low-resource scenarios, improves quality on cross-domain
tasks, and effectively reduces costs across the board.Comment: Accepted at the 2nd Workshop on Neural Machine Translation and
Generation (WNMT 2018
Achieving Human Parity on Automatic Chinese to English News Translation
Machine translation has made rapid advances in recent years. Millions of
people are using it today in online translation systems and mobile applications
in order to communicate across language barriers. The question naturally arises
whether such systems can approach or achieve parity with human translations. In
this paper, we first address the problem of how to define and accurately
measure human parity in translation. We then describe Microsoft's machine
translation system and measure the quality of its translations on the widely
used WMT 2017 news translation task from Chinese to English. We find that our
latest neural machine translation system has reached a new state-of-the-art,
and that the translation quality is at human parity when compared to
professional human translations. We also find that it significantly exceeds the
quality of crowd-sourced non-professional translations
Not All Neural Embeddings are Born Equal
Neural language models learn word representations that capture rich
linguistic and conceptual information. Here we investigate the embeddings
learned by neural machine translation models. We show that translation-based
embeddings outperform those learned by cutting-edge monolingual models at
single-language tasks requiring knowledge of conceptual similarity and/or
syntactic role. The findings suggest that, while monolingual models learn
information about how concepts are related, neural-translation models better
capture their true ontological status.Comment: 4 pages plus 1 page of reference
On Using Monolingual Corpora in Neural Machine Translation
Recent work on end-to-end neural network-based architectures for machine
translation has shown promising results for En-Fr and En-De translation.
Arguably, one of the major factors behind this success has been the
availability of high quality parallel corpora. In this work, we investigate how
to leverage abundant monolingual corpora for neural machine translation.
Compared to a phrase-based and hierarchical baseline, we obtain up to
BLEU improvement on the low-resource language pair Turkish-English, and
BLEU on the focused domain task of Chinese-English chat messages. While our
method was initially targeted toward such tasks with less parallel data, we
show that it also extends to high resource languages such as Cs-En and De-En
where we obtain an improvement of and BLEU scores over the neural
machine translation baselines, respectively.Comment: 9 pages, 2 figure
Unsupervised Paraphrasing without Translation
Paraphrasing exemplifies the ability to abstract semantic content from
surface forms. Recent work on automatic paraphrasing is dominated by methods
leveraging Machine Translation (MT) as an intermediate step. This contrasts
with humans, who can paraphrase without being bilingual. This work proposes to
learn paraphrasing models from an unlabeled monolingual corpus only. To that
end, we propose a residual variant of vector-quantized variational
auto-encoder.
We compare with MT-based approaches on paraphrase identification, generation,
and training augmentation. Monolingual paraphrasing outperforms unsupervised
translation in all settings. Comparisons with supervised translation are more
mixed: monolingual paraphrasing is interesting for identification and
augmentation; supervised translation is superior for generation.Comment: ACL 201
Effective Cross-lingual Transfer of Neural Machine Translation Models without Shared Vocabularies
Transfer learning or multilingual model is essential for low-resource neural
machine translation (NMT), but the applicability is limited to cognate
languages by sharing their vocabularies. This paper shows effective techniques
to transfer a pre-trained NMT model to a new, unrelated language without shared
vocabularies. We relieve the vocabulary mismatch by using cross-lingual word
embedding, train a more language-agnostic encoder by injecting artificial
noises, and generate synthetic data easily from the pre-training data without
back-translation. Our methods do not require restructuring the vocabulary or
retraining the model. We improve plain NMT transfer by up to +5.1% BLEU in five
low-resource translation tasks, outperforming multilingual joint training by a
large margin. We also provide extensive ablation studies on pre-trained
embedding, synthetic data, vocabulary size, and parameter freezing for a better
understanding of NMT transfer.Comment: ACL 2019 camera-read
Learning to Represent Bilingual Dictionaries
Bilingual word embeddings have been widely used to capture the similarity of
lexical semantics in different human languages. However, many applications,
such as cross-lingual semantic search and question answering, can be largely
benefited from the cross-lingual correspondence between sentences and lexicons.
To bridge this gap, we propose a neural embedding model that leverages
bilingual dictionaries. The proposed model is trained to map the literal word
definitions to the cross-lingual target words, for which we explore with
different sentence encoding techniques. To enhance the learning process on
limited resources, our model adopts several critical learning strategies,
including multi-task learning on different bridges of languages, and joint
learning of the dictionary model with a bilingual word embedding model.
Experimental evaluation focuses on two applications. The results of the
cross-lingual reverse dictionary retrieval task show our model's promising
ability of comprehending bilingual concepts based on descriptions, and
highlight the effectiveness of proposed learning strategies in improving
performance. Meanwhile, our model effectively addresses the bilingual
paraphrase identification problem and significantly outperforms previous
approaches.Comment: CoNLL 201
A Call for More Rigor in Unsupervised Cross-lingual Learning
We review motivations, definition, approaches, and methodology for
unsupervised cross-lingual learning and call for a more rigorous position in
each of them. An existing rationale for such research is based on the lack of
parallel data for many of the world's languages. However, we argue that a
scenario without any parallel data and abundant monolingual data is unrealistic
in practice. We also discuss different training signals that have been used in
previous work, which depart from the pure unsupervised setting. We then
describe common methodological issues in tuning and evaluation of unsupervised
cross-lingual models and present best practices. Finally, we provide a unified
outlook for different types of research in this area (i.e., cross-lingual word
embeddings, deep multilingual pretraining, and unsupervised machine
translation) and argue for comparable evaluation of these models.Comment: ACL 202
Sources of Transfer in Multilingual Named Entity Recognition
Named-entities are inherently multilingual, and annotations in any given
language may be limited. This motivates us to consider polyglot named-entity
recognition (NER), where one model is trained using annotated data drawn from
more than one language. However, a straightforward implementation of this
simple idea does not always work in practice: naive training of NER models
using annotated data drawn from multiple languages consistently underperforms
models trained on monolingual data alone, despite having access to more
training data. The starting point of this paper is a simple solution to this
problem, in which polyglot models are fine-tuned on monolingual data to
consistently and significantly outperform their monolingual counterparts. To
explain this phenomena, we explore the sources of multilingual transfer in
polyglot NER models and examine the weight structure of polyglot models
compared to their monolingual counterparts. We find that polyglot models
efficiently share many parameters across languages and that fine-tuning may
utilize a large number of those parameters.Comment: ACL 202
Multilingual Image Description with Neural Sequence Models
In this paper we present an approach to multi-language image description
bringing together insights from neural machine translation and neural image
description. To create a description of an image for a given target language,
our sequence generation models condition on feature vectors from the image, the
description from the source language, and/or a multimodal vector computed over
the image and a description in the source language. In image description
experiments on the IAPR-TC12 dataset of images aligned with English and German
sentences, we find significant and substantial improvements in BLEU4 and Meteor
scores for models trained over multiple languages, compared to a monolingual
baseline.Comment: Under review as a conference paper at ICLR 201
- …