1,804 research outputs found

    A Hybrid Approach to Semantic Hashtag Clustering in Social Media

    Get PDF
    The uncontrolled usage of hashtags in social media makes them vary a lot in the quality of semantics and the frequency of usage. Such variations pose a challenge to the current approaches which capitalize on either the lexical semantics of a hashtag by using metadata or the contextual semantics of a hashtag by using the texts associated with a hashtag. This thesis presents a hybrid approach to clustering hashtags based on their semantics, designed in two phases. The first phase is a sense-level metadata-based semantic clustering algorithm that has the ability to differentiate among distinct senses of a hashtag as opposed to the hashtag word itself. The gold standard test demonstrates that sense-level clusters are significantly more accurate than word-level clusters. The second phase is a hybrid semantic clustering algorithm using a consensus clustering approach which finds the consensus between metadata-based sense-level semantic clusters and text-based semantic clusters. The gold standard test shows that the hybrid algorithm outperforms both the text-based algorithm and the metadata-based algorithm for a majority of ground truths tested and that it never underperforms both baseline algorithms. In addition, a larger-scale performance study, conducted with a focus on disagreements in cluster assignments between algorithms, shows that the hybrid algorithm makes the correct cluster assignment in a majority of disagreement cases

    Semantics-driven event clustering in Twitter feeds

    Get PDF
    Detecting events using social media such as Twitter has many useful applications in real-life situations. Many algorithms which all use different information sources - either textual, temporal, geographic or community features - have been developed to achieve this task. Semantic information is often added at the end of the event detection to classify events into semantic topics. But semantic information can also be used to drive the actual event detection, which is less covered by academic research. We therefore supplemented an existing baseline event clustering algorithm with semantic information about the tweets in order to improve its performance. This paper lays out the details of the semantics-driven event clustering algorithms developed, discusses a novel method to aid in the creation of a ground truth for event detection purposes, and analyses how well the algorithms improve over baseline. We find that assigning semantic information to every individual tweet results in just a worse performance in F1 measure compared to baseline. If however semantics are assigned on a coarser, hashtag level the improvement over baseline is substantial and significant in both precision and recall

    Extracting semantic entities and events from sports tweets

    Get PDF
    Large volumes of user-generated content on practically every major issue and event are being created on the microblogging site Twitter. This content can be combined and processed to detect events, entities and popular moods to feed various knowledge-intensive practical applications. On the downside, these content items are very noisy and highly informal, making it difficult to extract sense out of the stream. In this paper, we exploit various approaches to detect the named entities and significant micro-events from users’ tweets during a live sports event. Here we describe how combining linguistic features with background knowledge and the use of Twitter-specific features can achieve high, precise detection results (f-measure = 87%) in different datasets. A study was conducted on tweets from cricket matches in the ICC World Cup in order to augment the event-related non-textual media with collective intelligence

    Measuring relative opinion from location-based social media: A case study of the 2016 U.S. presidential election

    Get PDF
    Social media has become an emerging alternative to opinion polls for public opinion collection, while it is still posing many challenges as a passive data source, such as structurelessness, quantifiability, and representativeness. Social media data with geotags provide new opportunities to unveil the geographic locations of users expressing their opinions. This paper aims to answer two questions: 1) whether quantifiable measurement of public opinion can be obtained from social media and 2) whether it can produce better or complementary measures compared to opinion polls. This research proposes a novel approach to measure the relative opinion of Twitter users towards public issues in order to accommodate more complex opinion structures and take advantage of the geography pertaining to the public issues. To ensure that this new measure is technically feasible, a modeling framework is developed including building a training dataset by adopting a state-of-the-art approach and devising a new deep learning method called Opinion-Oriented Word Embedding. With a case study of the tweets selected for the 2016 U.S. presidential election, we demonstrate the predictive superiority of our relative opinion approach and we show how it can aid visual analytics and support opinion predictions. Although the relative opinion measure is proved to be more robust compared to polling, our study also suggests that the former can advantageously complement the later in opinion prediction
    • …
    corecore