786 research outputs found

    Semantics-driven event clustering in Twitter feeds

    Get PDF
    Detecting events using social media such as Twitter has many useful applications in real-life situations. Many algorithms which all use different information sources - either textual, temporal, geographic or community features - have been developed to achieve this task. Semantic information is often added at the end of the event detection to classify events into semantic topics. But semantic information can also be used to drive the actual event detection, which is less covered by academic research. We therefore supplemented an existing baseline event clustering algorithm with semantic information about the tweets in order to improve its performance. This paper lays out the details of the semantics-driven event clustering algorithms developed, discusses a novel method to aid in the creation of a ground truth for event detection purposes, and analyses how well the algorithms improve over baseline. We find that assigning semantic information to every individual tweet results in just a worse performance in F1 measure compared to baseline. If however semantics are assigned on a coarser, hashtag level the improvement over baseline is substantial and significant in both precision and recall

    Tracking Dengue Epidemics using Twitter Content Classification and Topic Modelling

    Full text link
    Detecting and preventing outbreaks of mosquito-borne diseases such as Dengue and Zika in Brasil and other tropical regions has long been a priority for governments in affected areas. Streaming social media content, such as Twitter, is increasingly being used for health vigilance applications such as flu detection. However, previous work has not addressed the complexity of drastic seasonal changes on Twitter content across multiple epidemic outbreaks. In order to address this gap, this paper contrasts two complementary approaches to detecting Twitter content that is relevant for Dengue outbreak detection, namely supervised classification and unsupervised clustering using topic modelling. Each approach has benefits and shortcomings. Our classifier achieves a prediction accuracy of about 80\% based on a small training set of about 1,000 instances, but the need for manual annotation makes it hard to track seasonal changes in the nature of the epidemics, such as the emergence of new types of virus in certain geographical locations. In contrast, LDA-based topic modelling scales well, generating cohesive and well-separated clusters from larger samples. While clusters can be easily re-generated following changes in epidemics, however, this approach makes it hard to clearly segregate relevant tweets into well-defined clusters.Comment: Procs. SoWeMine - co-located with ICWE 2016. 2016, Lugano, Switzerlan

    A social computing solution to disaster relief

    Get PDF
    Disaster relief has chronically been a major issue, and various solutions have been presented, attempting to provide the best relief. Currently, disaster rescue teams are facing the problem of lack of valid information at the rescuring scene, resulting in worse relief and more casualties. Data analytics on social network has received success in multiple other application[s] like spam filtering and trend prediction, showing its potential in the field of disaster relief, with a few potential improvements like expanding the size of the dataset and including a more detailed map. The purpose of this research is to expand on previous applications and use social media data to generate a detailed disaster sitution map for first responders. With validated information about the disaster, both survivors and recuers can pinpoint hazardous areas and avoid further damage

    Can twitter replace newswire for breaking news?

    Get PDF
    Twitter is often considered to be a useful source of real-time news, potentially replacing newswire for this purpose. But is this true? In this paper, we examine the extent to which news reporting in newswire and Twitter overlap and whether Twitter often reports news faster than traditional newswire providers. In particular, we analyse 77 days worth of tweet and newswire articles with respect to both manually identified major news events and larger volumes of automatically identified news events. Our results indicate that Twitter reports the same events as newswire providers, in addition to a long tail of minor events ignored by mainstream media. However, contrary to popular belief, neither stream leads the other when dealing with major news events, indicating that the value that Twitter can bring in a news setting comes predominantly from increased event coverage, not timeliness of reporting

    A Complete Text-Processing Pipeline for Business Performance Tracking

    Get PDF
    Natural text processing is amongst the most researched domains because of its varied applications. However, most existing works focus on improving the performance of machine learning models instead of applying those models in practical business cases. We present a text processing pipeline that enables business users to identify business performance factors through sentiment analysis and opinion summarization of customer feedback. The pipeline performs fine-grained sentiment classification of customer comments, and the results are used for the sentiment trend tracking process. The pipeline also performs topic modelling in which key aspects of customer comments are clustered using their co-relation scores. The results are used to produce abstractive opinion summarization. The proposed text processing pipeline is evaluated using two business cases in the food and retail domains. The performance of the sentiment analysis component is measured using mean absolute error (MAE) rate, root mean squared error (RMSE) rate, and coefficient of determination
    • …
    corecore