50 research outputs found
Density Matching for Bilingual Word Embedding
Recent approaches to cross-lingual word embedding have generally been based
on linear transformations between the sets of embedding vectors in the two
languages. In this paper, we propose an approach that instead expresses the two
monolingual embedding spaces as probability densities defined by a Gaussian
mixture model, and matches the two densities using a method called normalizing
flow. The method requires no explicit supervision, and can be learned with only
a seed dictionary of words that have identical strings. We argue that this
formulation has several intuitively attractive properties, particularly with
the respect to improving robustness and generalization to mappings between
difficult language pairs or word pairs. On a benchmark data set of bilingual
lexicon induction and cross-lingual word similarity, our approach can achieve
competitive or superior performance compared to state-of-the-art published
results, with particularly strong results being found on etymologically distant
and/or morphologically rich languages.Comment: Accepted by NAACL-HLT 201
Why is unsupervised alignment of English embeddings from different algorithms so hard?
This paper presents a challenge to the community: Generative adversarial
networks (GANs) can perfectly align independent English word embeddings induced
using the same algorithm, based on distributional information alone; but fails
to do so, for two different embeddings algorithms. Why is that? We believe
understanding why, is key to understand both modern word embedding algorithms
and the limitations and instability dynamics of GANs. This paper shows that (a)
in all these cases, where alignment fails, there exists a linear transform
between the two embeddings (so algorithm biases do not lead to non-linear
differences), and (b) similar effects can not easily be obtained by varying
hyper-parameters. One plausible suggestion based on our initial experiments is
that the differences in the inductive biases of the embedding algorithms lead
to an optimization landscape that is riddled with local optima, leading to a
very small basin of convergence, but we present this more as a challenge paper
than a technical contribution.Comment: Accepted at EMNLP 201