514 research outputs found
A robust self-learning method for fully unsupervised cross-lingual mappings of word embeddings
Recent work has managed to learn cross-lingual word embeddings without
parallel data by mapping monolingual embeddings to a shared space through
adversarial training. However, their evaluation has focused on favorable
conditions, using comparable corpora or closely-related languages, and we show
that they often fail in more realistic scenarios. This work proposes an
alternative approach based on a fully unsupervised initialization that
explicitly exploits the structural similarity of the embeddings, and a robust
self-learning algorithm that iteratively improves this solution. Our method
succeeds in all tested scenarios and obtains the best published results in
standard datasets, even surpassing previous supervised systems. Our
implementation is released as an open source project at
https://github.com/artetxem/vecmapComment: ACL 201
Density Matching for Bilingual Word Embedding
Recent approaches to cross-lingual word embedding have generally been based
on linear transformations between the sets of embedding vectors in the two
languages. In this paper, we propose an approach that instead expresses the two
monolingual embedding spaces as probability densities defined by a Gaussian
mixture model, and matches the two densities using a method called normalizing
flow. The method requires no explicit supervision, and can be learned with only
a seed dictionary of words that have identical strings. We argue that this
formulation has several intuitively attractive properties, particularly with
the respect to improving robustness and generalization to mappings between
difficult language pairs or word pairs. On a benchmark data set of bilingual
lexicon induction and cross-lingual word similarity, our approach can achieve
competitive or superior performance compared to state-of-the-art published
results, with particularly strong results being found on etymologically distant
and/or morphologically rich languages.Comment: Accepted by NAACL-HLT 201
- …