Search CORE

413 research outputs found

Trans-gram, Fast Cross-lingual Word-embeddings

Author: Benhalloum Amine
Coulmance Jocelyn
Marty Jean-Marc
Wenzek Guillaume
Publication venue
Publication date: 01/01/2015
Field of study

We introduce Trans-gram, a simple and computationally-efficient method to simultaneously learn and align wordembeddings for a variety of languages, using only monolingual data and a smaller set of sentence-aligned data. We use our new method to compute aligned wordembeddings for twenty-one languages using English as a pivot language. We show that some linguistic features are aligned across languages for which we do not have aligned data, even though those properties do not exist in the pivot language. We also achieve state of the art results on standard cross-lingual text classification and word translation tasks.Comment: EMNLP 201

arXiv.org e-Print Archive

Firearms and Tigers are Dangerous, Kitchen Knives and Zebras are Not: Testing whether Word Embeddings Can Tell

Author: Fokkens Antske
Sommerauer Pia
Publication venue
Publication date: 01/01/2018
Field of study

This paper presents an approach for investigating the nature of semantic information captured by word embeddings. We propose a method that extends an existing human-elicited semantic property dataset with gold negative examples using crowd judgments. Our experimental approach tests the ability of supervised classifiers to identify semantic features in word embedding vectors and com- pares this to a feature-identification method based on full vector cosine similarity. The idea behind this method is that properties identified by classifiers, but not through full vector comparison are captured by embeddings. Properties that cannot be identified by either method are not. Our results provide an initial indication that semantic properties relevant for the way entities interact (e.g. dangerous) are captured, while perceptual information (e.g. colors) is not represented. We conclude that, though preliminary, these results show that our method is suitable for identifying which properties are captured by embeddings.Comment: Accepted to the EMNLP workshop "Analyzing and interpreting neural networks for NLP

arXiv.org e-Print Archive