413 research outputs found
Trans-gram, Fast Cross-lingual Word-embeddings
We introduce Trans-gram, a simple and computationally-efficient method to
simultaneously learn and align wordembeddings for a variety of languages, using
only monolingual data and a smaller set of sentence-aligned data. We use our
new method to compute aligned wordembeddings for twenty-one languages using
English as a pivot language. We show that some linguistic features are aligned
across languages for which we do not have aligned data, even though those
properties do not exist in the pivot language. We also achieve state of the art
results on standard cross-lingual text classification and word translation
tasks.Comment: EMNLP 201
Firearms and Tigers are Dangerous, Kitchen Knives and Zebras are Not: Testing whether Word Embeddings Can Tell
This paper presents an approach for investigating the nature of semantic
information captured by word embeddings. We propose a method that extends an
existing human-elicited semantic property dataset with gold negative examples
using crowd judgments. Our experimental approach tests the ability of
supervised classifiers to identify semantic features in word embedding vectors
and com- pares this to a feature-identification method based on full vector
cosine similarity. The idea behind this method is that properties identified by
classifiers, but not through full vector comparison are captured by embeddings.
Properties that cannot be identified by either method are not. Our results
provide an initial indication that semantic properties relevant for the way
entities interact (e.g. dangerous) are captured, while perceptual information
(e.g. colors) is not represented. We conclude that, though preliminary, these
results show that our method is suitable for identifying which properties are
captured by embeddings.Comment: Accepted to the EMNLP workshop "Analyzing and interpreting neural
networks for NLP
- …