2,613 research outputs found
Morphological Priors for Probabilistic Neural Word Embeddings
Word embeddings allow natural language processing systems to share
statistical information across related words. These embeddings are typically
based on distributional statistics, making it difficult for them to generalize
to rare or unseen words. We propose to improve word embeddings by incorporating
morphological information, capturing shared sub-word features. Unlike previous
work that constructs word embeddings directly from morphemes, we combine
morphological and distributional information in a unified probabilistic
framework, in which the word embedding is a latent variable. The morphological
information provides a prior distribution on the latent word embeddings, which
in turn condition a likelihood function over an observed corpus. This approach
yields improvements on intrinsic word similarity evaluations, and also in the
downstream task of part-of-speech tagging.Comment: Appeared at the Conference on Empirical Methods in Natural Language
Processing (EMNLP 2016, Austin
- …