732 research outputs found
Probabilistic FastText for Multi-Sense Word Embeddings
We introduce Probabilistic FastText, a new model for word embeddings that can
capture multiple word senses, sub-word structure, and uncertainty information.
In particular, we represent each word with a Gaussian mixture density, where
the mean of a mixture component is given by the sum of n-grams. This
representation allows the model to share statistical strength across sub-word
structures (e.g. Latin roots), producing accurate representations of rare,
misspelt, or even unseen words. Moreover, each component of the mixture can
capture a different word sense. Probabilistic FastText outperforms both
FastText, which has no probabilistic model, and dictionary-level probabilistic
embeddings, which do not incorporate subword structures, on several
word-similarity benchmarks, including English RareWord and foreign language
datasets. We also achieve state-of-art performance on benchmarks that measure
ability to discern different meanings. Thus, the proposed model is the first to
achieve multi-sense representations while having enriched semantics on rare
words.Comment: Published at ACL 201
Probabilistic FastText for Multi-Sense Word Embeddings
We introduce Probabilistic FastText, a new model for word embeddings that can capture multiple word senses, sub-word structure, and uncertainty information. In particular, we represent each word with a Gaussian mixture density, where the mean of a mixture component is given by the sum of n-grams. This representation allows the model to share statistical strength across sub-word structures (e.g. Latin roots), producing accurate representations of rare, misspelt, or even unseen words. Moreover, each component of the mixture can capture a different word sense. Probabilistic FastText outperforms both FastText, which has no probabilistic model, and dictionary-level probabilistic embeddings, which do not incorporate subword structures, on several word-similarity benchmarks, including English RareWord and foreign language datasets. We also achieve state-of-art performance on benchmarks that measure ability to discern different meanings. Thus, the proposed model is the first to achieve multi-sense representations while having enriched semantics on rare words
Adversarial Propagation and Zero-Shot Cross-Lingual Transfer of Word Vector Specialization
Semantic specialization is the process of fine-tuning pre-trained
distributional word vectors using external lexical knowledge (e.g., WordNet) to
accentuate a particular semantic relation in the specialized vector space.
While post-processing specialization methods are applicable to arbitrary
distributional vectors, they are limited to updating only the vectors of words
occurring in external lexicons (i.e., seen words), leaving the vectors of all
other words unchanged. We propose a novel approach to specializing the full
distributional vocabulary. Our adversarial post-specialization method
propagates the external lexical knowledge to the full distributional space. We
exploit words seen in the resources as training examples for learning a global
specialization function. This function is learned by combining a standard
L2-distance loss with an adversarial loss: the adversarial component produces
more realistic output vectors. We show the effectiveness and robustness of the
proposed method across three languages and on three tasks: word similarity,
dialog state tracking, and lexical simplification. We report consistent
improvements over distributional word vectors and vectors specialized by other
state-of-the-art specialization frameworks. Finally, we also propose a
cross-lingual transfer method for zero-shot specialization which successfully
specializes a full target distributional space without any lexical knowledge in
the target language and without any bilingual data.Comment: Accepted at EMNLP 201
- …