111 research outputs found

    Is That Twitter Hashtag Worth Reading

    Full text link
    Online social media such as Twitter, Facebook, Wikis and Linkedin have made a great impact on the way we consume information in our day to day life. Now it has become increasingly important that we come across appropriate content from the social media to avoid information explosion. In case of Twitter, popular information can be tracked using hashtags. Studying the characteristics of tweets containing hashtags becomes important for a number of tasks, such as breaking news detection, personalized message recommendation, friends recommendation, and sentiment analysis among others. In this paper, we have analyzed Twitter data based on trending hashtags, which is widely used nowadays. We have used event based hashtags to know users' thoughts on those events and to decide whether the rest of the users might find it interesting or not. We have used topic modeling, which reveals the hidden thematic structure of the documents (tweets in this case) in addition to sentiment analysis in exploring and summarizing the content of the documents. A technique to find the interestingness of event based twitter hashtag and the associated sentiment has been proposed. The proposed technique helps twitter follower to read, relevant and interesting hashtag.Comment: 10 pages, 6 figures, Presented at the Third International Symposium on Women in Computing and Informatics (WCI-2015

    Characterizing Geo-located Tweets in Brazilian Megacities

    Full text link
    This work presents a framework for collecting, processing and mining geo-located tweets in order to extract meaningful and actionable knowledge in the context of smart cities. We collected and characterized more than 9M tweets from the two biggest cities in Brazil, Rio de Janeiro and S\~ao Paulo. We performed topic modeling using the Latent Dirichlet Allocation model to produce an unsupervised distribution of semantic topics over the stream of geo-located tweets as well as a distribution of words over those topics. We manually labeled and aggregated similar topics obtaining a total of 29 different topics across both cities. Results showed similarities in the majority of topics for both cities, reflecting similar interests and concerns among the population of Rio de Janeiro and S\~ao Paulo. Nevertheless, some specific topics are more predominant in one of the cities

    Characterizing Geo-located Tweets in Brazilian Megacities

    Full text link
    This work presents a framework for collecting, processing and mining geo-located tweets in order to extract meaningful and actionable knowledge in the context of smart cities. We collected and characterized more than 9M tweets from the two biggest cities in Brazil, Rio de Janeiro and S\~ao Paulo. We performed topic modeling using the Latent Dirichlet Allocation model to produce an unsupervised distribution of semantic topics over the stream of geo-located tweets as well as a distribution of words over those topics. We manually labeled and aggregated similar topics obtaining a total of 29 different topics across both cities. Results showed similarities in the majority of topics for both cities, reflecting similar interests and concerns among the population of Rio de Janeiro and S\~ao Paulo. Nevertheless, some specific topics are more predominant in one of the cities

    Tracking Dengue Epidemics using Twitter Content Classification and Topic Modelling

    Full text link
    Detecting and preventing outbreaks of mosquito-borne diseases such as Dengue and Zika in Brasil and other tropical regions has long been a priority for governments in affected areas. Streaming social media content, such as Twitter, is increasingly being used for health vigilance applications such as flu detection. However, previous work has not addressed the complexity of drastic seasonal changes on Twitter content across multiple epidemic outbreaks. In order to address this gap, this paper contrasts two complementary approaches to detecting Twitter content that is relevant for Dengue outbreak detection, namely supervised classification and unsupervised clustering using topic modelling. Each approach has benefits and shortcomings. Our classifier achieves a prediction accuracy of about 80\% based on a small training set of about 1,000 instances, but the need for manual annotation makes it hard to track seasonal changes in the nature of the epidemics, such as the emergence of new types of virus in certain geographical locations. In contrast, LDA-based topic modelling scales well, generating cohesive and well-separated clusters from larger samples. While clusters can be easily re-generated following changes in epidemics, however, this approach makes it hard to clearly segregate relevant tweets into well-defined clusters.Comment: Procs. SoWeMine - co-located with ICWE 2016. 2016, Lugano, Switzerlan

    Timeline Generation: Tracking individuals on Twitter

    Full text link
    In this paper, we propose a unsupervised framework to reconstruct a person's life history by creating a chronological list for {\it personal important events} (PIE) of individuals based on the tweets they published. By analyzing individual tweet collections, we find that what are suitable for inclusion in the personal timeline should be tweets talking about personal (as opposed to public) and time-specific (as opposed to time-general) topics. To further extract these types of topics, we introduce a non-parametric multi-level Dirichlet Process model to recognize four types of tweets: personal time-specific (PersonTS), personal time-general (PersonTG), public time-specific (PublicTS) and public time-general (PublicTG) topics, which, in turn, are used for further personal event extraction and timeline generation. To the best of our knowledge, this is the first work focused on the generation of timeline for individuals from twitter data. For evaluation, we have built a new golden standard Timelines based on Twitter and Wikipedia that contain PIE related events from 20 {\it ordinary twitter users} and 20 {\it celebrities}. Experiments on real Twitter data quantitatively demonstrate the effectiveness of our approach
    • …
    corecore