4 research outputs found
Probabilistic FastText for Multi-Sense Word Embeddings
We introduce Probabilistic FastText, a new model for word embeddings that can
capture multiple word senses, sub-word structure, and uncertainty information.
In particular, we represent each word with a Gaussian mixture density, where
the mean of a mixture component is given by the sum of n-grams. This
representation allows the model to share statistical strength across sub-word
structures (e.g. Latin roots), producing accurate representations of rare,
misspelt, or even unseen words. Moreover, each component of the mixture can
capture a different word sense. Probabilistic FastText outperforms both
FastText, which has no probabilistic model, and dictionary-level probabilistic
embeddings, which do not incorporate subword structures, on several
word-similarity benchmarks, including English RareWord and foreign language
datasets. We also achieve state-of-art performance on benchmarks that measure
ability to discern different meanings. Thus, the proposed model is the first to
achieve multi-sense representations while having enriched semantics on rare
words.Comment: Published at ACL 201
Probabilistic FastText for Multi-Sense Word Embeddings
We introduce Probabilistic FastText, a new model for word embeddings that can capture multiple word senses, sub-word structure, and uncertainty information. In particular, we represent each word with a Gaussian mixture density, where the mean of a mixture component is given by the sum of n-grams. This representation allows the model to share statistical strength across sub-word structures (e.g. Latin roots), producing accurate representations of rare, misspelt, or even unseen words. Moreover, each component of the mixture can capture a different word sense. Probabilistic FastText outperforms both FastText, which has no probabilistic model, and dictionary-level probabilistic embeddings, which do not incorporate subword structures, on several word-similarity benchmarks, including English RareWord and foreign language datasets. We also achieve state-of-art performance on benchmarks that measure ability to discern different meanings. Thus, the proposed model is the first to achieve multi-sense representations while having enriched semantics on rare words