Exploiting Topical Perceptions Over Multi-Lingual Text For Hashtag Suggestion On Twitter

Abstract

Microblogging websites, such as Twitter, provide seemingly endless amount of textual information on a wide variety of topics generated by a large number of users. Microblog posts, or tweets in Twitter, are often written in an informal manner using multi-lingual styles. Ignoring informal styles or multiple languages can hamper the usefulness of microblogging mining applications. In this paper, we present a statistical method for processing tweets according to users perceptions of topics and hashtags. Based on the non-classical notion of relatedness of vocabulary terms to topics in a corpus, which is quantified by discriminative term weights, our method builds a ranked list of terms related to hashtags. Subsequently, given a new tweet, our method can suggest a ranked list of hashtags. Our method allows enhanced understanding and normalization of users perceptions for improved information retrieval applications. We evaluate our method on a dataset of 14 million tweets collected over a period of 52 days. Results demonstrate that the method actually learns useful relationships between vocabulary terms and topics, and that the performance is better than a Naive Bayes suggestion system. Copyright © 2013, Association for the Advancement of Artificial Intelligence. All rights reserved

    Similar works