6,631 research outputs found
Analyzing the Limitations of Cross-lingual Word Embedding Mappings
Recent research in cross-lingual word embeddings has almost exclusively
focused on offline methods, which independently train word embeddings in
different languages and map them to a shared space through linear
transformations. While several authors have questioned the underlying
isomorphism assumption, which states that word embeddings in different
languages have approximately the same structure, it is not clear whether this
is an inherent limitation of mapping approaches or a more general issue when
learning cross-lingual embeddings. So as to answer this question, we experiment
with parallel corpora, which allows us to compare offline mapping to an
extension of skip-gram that jointly learns both embedding spaces. We observe
that, under these ideal conditions, joint learning yields to more isomorphic
embeddings, is less sensitive to hubness, and obtains stronger results in
bilingual lexicon induction. We thus conclude that current mapping methods do
have strong limitations, calling for further research to jointly learn
cross-lingual embeddings with a weaker cross-lingual signal.Comment: ACL 201
Model Transfer for Tagging Low-resource Languages using a Bilingual Dictionary
Cross-lingual model transfer is a compelling and popular method for
predicting annotations in a low-resource language, whereby parallel corpora
provide a bridge to a high-resource language and its associated annotated
corpora. However, parallel data is not readily available for many languages,
limiting the applicability of these approaches. We address these drawbacks in
our framework which takes advantage of cross-lingual word embeddings trained
solely on a high coverage bilingual dictionary. We propose a novel neural
network model for joint training from both sources of data based on
cross-lingual word embeddings, and show substantial empirical improvements over
baseline techniques. We also propose several active learning heuristics, which
result in improvements over competitive benchmark methods.Comment: 5 pages with 2 pages reference. Accepted to appear in ACL 201
- …