Distributed word representations are widely used for modeling words in NLP
tasks. Most of the existing models generate one representation per word and do
not consider different meanings of a word. We present two approaches to learn
multiple topic-sensitive representations per word by using Hierarchical
Dirichlet Process. We observe that by modeling topics and integrating topic
distributions for each document we obtain representations that are able to
distinguish between different meanings of a given word. Our models yield
statistically significant improvements for the lexical substitution task
indicating that commonly used single word representations, even when combined
with contextual information, are insufficient for this task.Comment: 5 pages, 1 figure, Accepted at ACL 201