257 research outputs found
Learning Unsupervised Word Mapping by Maximizing Mean Discrepancy
Cross-lingual word embeddings aim to capture common linguistic regularities
of different languages, which benefit various downstream tasks ranging from
machine translation to transfer learning. Recently, it has been shown that
these embeddings can be effectively learned by aligning two disjoint
monolingual vector spaces through a linear transformation (word mapping). In
this work, we focus on learning such a word mapping without any supervision
signal. Most previous work of this task adopts parametric metrics to measure
distribution differences, which typically requires a sophisticated alternate
optimization process, either in the form of \emph{minmax game} or intermediate
\emph{density estimation}. This alternate optimization process is relatively
hard and unstable. In order to avoid such sophisticated alternate optimization,
we propose to learn unsupervised word mapping by directly maximizing the mean
discrepancy between the distribution of transferred embedding and target
embedding. Extensive experimental results show that our proposed model
outperforms competitive baselines by a large margin
LNMap: Departures from Isomorphic Assumption in Bilingual Lexicon Induction Through Non-Linear Mapping in Latent Space
Most of the successful and predominant methods for bilingual lexicon
induction (BLI) are mapping-based, where a linear mapping function is learned
with the assumption that the word embedding spaces of different languages
exhibit similar geometric structures (i.e., approximately isomorphic). However,
several recent studies have criticized this simplified assumption showing that
it does not hold in general even for closely related languages. In this work,
we propose a novel semi-supervised method to learn cross-lingual word
embeddings for BLI. Our model is independent of the isomorphic assumption and
uses nonlinear mapping in the latent space of two independently trained
auto-encoders. Through extensive experiments on fifteen (15) different language
pairs (in both directions) comprising resource-rich and low-resource languages
from two different datasets, we demonstrate that our method outperforms
existing models by a good margin. Ablation studies show the importance of
different model components and the necessity of non-linear mapping.Comment: 10 pages, 1 figur
Unsupervised Cross-lingual Transfer of Word Embedding Spaces
Cross-lingual transfer of word embeddings aims to establish the semantic
mappings among words in different languages by learning the transformation
functions over the corresponding word embedding spaces. Successfully solving
this problem would benefit many downstream tasks such as to translate text
classification models from resource-rich languages (e.g. English) to
low-resource languages. Supervised methods for this problem rely on the
availability of cross-lingual supervision, either using parallel corpora or
bilingual lexicons as the labeled data for training, which may not be available
for many low resource languages. This paper proposes an unsupervised learning
approach that does not require any cross-lingual labeled data. Given two
monolingual word embedding spaces for any language pair, our algorithm
optimizes the transformation functions in both directions simultaneously based
on distributional matching as well as minimizing the back-translation losses.
We use a neural network implementation to calculate the Sinkhorn distance, a
well-defined distributional similarity measure, and optimize our objective
through back-propagation. Our evaluation on benchmark datasets for bilingual
lexicon induction and cross-lingual word similarity prediction shows stronger
or competitive performance of the proposed method compared to other
state-of-the-art supervised and unsupervised baseline methods over many
language pairs.Comment: EMNLP 201
Unsupervised Neural Machine Translation
In spite of the recent success of neural machine translation (NMT) in
standard benchmarks, the lack of large parallel corpora poses a major practical
problem for many language pairs. There have been several proposals to alleviate
this issue with, for instance, triangulation and semi-supervised learning
techniques, but they still require a strong cross-lingual signal. In this work,
we completely remove the need of parallel data and propose a novel method to
train an NMT system in a completely unsupervised manner, relying on nothing but
monolingual corpora. Our model builds upon the recent work on unsupervised
embedding mappings, and consists of a slightly modified attentional
encoder-decoder model that can be trained on monolingual corpora alone using a
combination of denoising and backtranslation. Despite the simplicity of the
approach, our system obtains 15.56 and 10.21 BLEU points in WMT 2014
French-to-English and German-to-English translation. The model can also profit
from small parallel corpora, and attains 21.81 and 15.24 points when combined
with 100,000 parallel sentences, respectively. Our implementation is released
as an open source project.Comment: Published as a conference paper at ICLR 201
Learning Multilingual Word Embeddings in Latent Metric Space: A Geometric Approach
We propose a novel geometric approach for learning bilingual mappings given
monolingual embeddings and a bilingual dictionary. Our approach decouples
learning the transformation from the source language to the target language
into (a) learning rotations for language-specific embeddings to align them to a
common space, and (b) learning a similarity metric in the common space to model
similarities between the embeddings. We model the bilingual mapping problem as
an optimization problem on smooth Riemannian manifolds. We show that our
approach outperforms previous approaches on the bilingual lexicon induction and
cross-lingual word similarity tasks. We also generalize our framework to
represent multiple languages in a common latent space. In particular, the
latent space representations for several languages are learned jointly, given
bilingual dictionaries for multiple language pairs. We illustrate the
effectiveness of joint learning for multiple languages in zero-shot word
translation setting. Our implementation is available at
https://github.com/anoopkunchukuttan/geomm .Comment: Accepted in Transactions of the Association for Computational
Linguistic
Extract and Edit: An Alternative to Back-Translation for Unsupervised Neural Machine Translation
The overreliance on large parallel corpora significantly limits the
applicability of machine translation systems to the majority of language pairs.
Back-translation has been dominantly used in previous approaches for
unsupervised neural machine translation, where pseudo sentence pairs are
generated to train the models with a reconstruction loss. However, the pseudo
sentences are usually of low quality as translation errors accumulate during
training. To avoid this fundamental issue, we propose an alternative but more
effective approach, extract-edit, to extract and then edit real sentences from
the target monolingual corpora. Furthermore, we introduce a comparative
translation loss to evaluate the translated target sentences and thus train the
unsupervised translation systems. Experiments show that the proposed approach
consistently outperforms the previous state-of-the-art unsupervised machine
translation systems across two benchmarks (English-French and English-German)
and two low-resource language pairs (English-Romanian and English-Russian) by
more than 2 (up to 3.63) BLEU points.Comment: 11 pages, 3 figures. Accepted to NAACL 201
Unsupervised Word Translation Pairing using Refinement based Point Set Registration
Cross-lingual alignment of word embeddings play an important role in
knowledge transfer across languages, for improving machine translation and
other multi-lingual applications. Current unsupervised approaches rely on
similarities in geometric structure of word embedding spaces across languages,
to learn structure-preserving linear transformations using adversarial networks
and refinement strategies. However, such techniques, in practice, tend to
suffer from instability and convergence issues, requiring tedious fine-tuning
for precise parameter setting. This paper proposes BioSpere, a novel framework
for unsupervised mapping of bi-lingual word embeddings onto a shared vector
space, by combining adversarial initialization and refinement procedure with
point set registration algorithm used in image processing. We show that our
framework alleviates the shortcomings of existing methodologies, and is
relatively invariant to variable adversarial learning performance, depicting
robustness in terms of parameter choices and training losses. Experimental
evaluation on parallel dictionary induction task demonstrates state-of-the-art
results for our framework on diverse language pairs
Robust Cross-lingual Embeddings from Parallel Sentences
Recent advances in cross-lingual word embeddings have primarily relied on
mapping-based methods, which project pretrained word embeddings from different
languages into a shared space through a linear transformation. However, these
approaches assume word embedding spaces are isomorphic between different
languages, which has been shown not to hold in practice (S{\o}gaard et al.,
2018), and fundamentally limits their performance. This motivates investigating
joint learning methods which can overcome this impediment, by simultaneously
learning embeddings across languages via a cross-lingual term in the training
objective. We propose a bilingual extension of the CBOW method which leverages
sentence-aligned corpora to obtain robust cross-lingual word and sentence
representations. Our approach significantly improves cross-lingual sentence
retrieval performance over all other approaches while maintaining parity with
the current state-of-the-art methods on word-translation. It also achieves
parity with a deep RNN method on a zero-shot cross-lingual document
classification task, requiring far fewer computational resources for training
and inference. As an additional advantage, our bilingual method leads to a much
more pronounced improvement in the the quality of monolingual word vectors
compared to other competing methods
Scalable Cross-Lingual Transfer of Neural Sentence Embeddings
We develop and investigate several cross-lingual alignment approaches for
neural sentence embedding models, such as the supervised inference classifier,
InferSent, and sequential encoder-decoder models. We evaluate three alignment
frameworks applied to these models: joint modeling, representation transfer
learning, and sentence mapping, using parallel text to guide the alignment. Our
results support representation transfer as a scalable approach for modular
cross-lingual alignment of neural sentence embeddings, where we observe better
performance compared to joint models in intrinsic and extrinsic evaluations,
particularly with smaller sets of parallel data.Comment: accepted in *SEM 201
- …