983 research outputs found

    Tracking Dengue Epidemics using Twitter Content Classification and Topic Modelling

    Full text link
    Detecting and preventing outbreaks of mosquito-borne diseases such as Dengue and Zika in Brasil and other tropical regions has long been a priority for governments in affected areas. Streaming social media content, such as Twitter, is increasingly being used for health vigilance applications such as flu detection. However, previous work has not addressed the complexity of drastic seasonal changes on Twitter content across multiple epidemic outbreaks. In order to address this gap, this paper contrasts two complementary approaches to detecting Twitter content that is relevant for Dengue outbreak detection, namely supervised classification and unsupervised clustering using topic modelling. Each approach has benefits and shortcomings. Our classifier achieves a prediction accuracy of about 80\% based on a small training set of about 1,000 instances, but the need for manual annotation makes it hard to track seasonal changes in the nature of the epidemics, such as the emergence of new types of virus in certain geographical locations. In contrast, LDA-based topic modelling scales well, generating cohesive and well-separated clusters from larger samples. While clusters can be easily re-generated following changes in epidemics, however, this approach makes it hard to clearly segregate relevant tweets into well-defined clusters.Comment: Procs. SoWeMine - co-located with ICWE 2016. 2016, Lugano, Switzerlan

    Unsupervised keyword extraction from microblog posts via hashtags

    Full text link
    © River Publishers. Nowadays, huge amounts of texts are being generated for social networking purposes on Web. Keyword extraction from such texts like microblog posts benefits many applications such as advertising, search, and content filtering. Unlike traditional web pages, a microblog post usually has some special social feature like a hashtag that is topical in nature and generated by users. Extracting keywords related to hashtags can reflect the intents of users and thus provides us better understanding on post content. In this paper, we propose a novel unsupervised keyword extraction approach for microblog posts by treating hashtags as topical indicators. Our approach consists of two hashtag enhanced algorithms. One is a topic model algorithm that infers topic distributions biased to hashtags on a collection of microblog posts. The words are ranked by their average topic probabilities. Our topic model algorithm can not only find the topics of a collection, but also extract hashtag-related keywords. The other is a random walk based algorithm. It first builds a word-post weighted graph by taking into account posts themselves. Then, a hashtag biased random walk is applied on this graph, which guides the algorithm to extract keywords according to hashtag topics. Last, the final ranking score of a word is determined by the stationary probability after a number of iterations. We evaluate our proposed approach on a collection of real Chinese microblog posts. Experiments show that our approach is more effective in terms of precision than traditional approaches considering no hashtag. The result achieved by the combination of two algorithms performs even better than each individual algorithm

    Characterizing Geo-located Tweets in Brazilian Megacities

    Full text link
    This work presents a framework for collecting, processing and mining geo-located tweets in order to extract meaningful and actionable knowledge in the context of smart cities. We collected and characterized more than 9M tweets from the two biggest cities in Brazil, Rio de Janeiro and S\~ao Paulo. We performed topic modeling using the Latent Dirichlet Allocation model to produce an unsupervised distribution of semantic topics over the stream of geo-located tweets as well as a distribution of words over those topics. We manually labeled and aggregated similar topics obtaining a total of 29 different topics across both cities. Results showed similarities in the majority of topics for both cities, reflecting similar interests and concerns among the population of Rio de Janeiro and S\~ao Paulo. Nevertheless, some specific topics are more predominant in one of the cities

    Characterizing Geo-located Tweets in Brazilian Megacities

    Full text link
    This work presents a framework for collecting, processing and mining geo-located tweets in order to extract meaningful and actionable knowledge in the context of smart cities. We collected and characterized more than 9M tweets from the two biggest cities in Brazil, Rio de Janeiro and S\~ao Paulo. We performed topic modeling using the Latent Dirichlet Allocation model to produce an unsupervised distribution of semantic topics over the stream of geo-located tweets as well as a distribution of words over those topics. We manually labeled and aggregated similar topics obtaining a total of 29 different topics across both cities. Results showed similarities in the majority of topics for both cities, reflecting similar interests and concerns among the population of Rio de Janeiro and S\~ao Paulo. Nevertheless, some specific topics are more predominant in one of the cities
    corecore