1,804 research outputs found
A Hybrid Approach to Semantic Hashtag Clustering in Social Media
The uncontrolled usage of hashtags in social media makes them vary a lot in the quality of semantics and the frequency of usage. Such variations pose a challenge to the current approaches which capitalize on either the lexical semantics of a hashtag by using metadata or the contextual semantics of a hashtag by using the texts associated with a hashtag. This thesis presents a hybrid approach to clustering hashtags based on their semantics, designed in two phases. The first phase is a sense-level metadata-based semantic clustering algorithm that has the ability to differentiate among distinct senses of a hashtag as opposed to the hashtag word itself. The gold standard test demonstrates that sense-level clusters are significantly more accurate than word-level clusters. The second phase is a hybrid semantic clustering algorithm using a consensus clustering approach which finds the consensus between metadata-based sense-level semantic clusters and text-based semantic clusters. The gold standard test shows that the hybrid algorithm outperforms both the text-based algorithm and the metadata-based algorithm for a majority of ground truths tested and that it never underperforms both baseline algorithms. In addition, a larger-scale performance study, conducted with a focus on disagreements in cluster assignments between algorithms, shows that the hybrid algorithm makes the correct cluster assignment in a majority of disagreement cases
Semantics-driven event clustering in Twitter feeds
Detecting events using social media such as Twitter has many useful applications in real-life situations. Many algorithms which all use different information sources - either textual, temporal, geographic or community features - have been developed to achieve this task. Semantic information is often added at the end of the event detection to classify events into semantic topics. But semantic information can also be used to drive the actual event detection, which is less covered by academic research. We therefore supplemented an existing baseline event clustering algorithm with semantic information about the tweets in order to improve its performance. This paper lays out the details of the semantics-driven event clustering algorithms developed, discusses a novel method to aid in the creation of a ground truth for event detection purposes, and analyses how well the algorithms improve over baseline. We find that assigning semantic information to every individual tweet results in just a worse performance in F1 measure compared to baseline. If however semantics are assigned on a coarser, hashtag level the improvement over baseline is substantial and significant in both precision and recall
Extracting semantic entities and events from sports tweets
Large volumes of user-generated content on practically every major issue and event are being created on the microblogging site Twitter. This content can be combined and processed to detect events, entities and popular moods to feed various knowledge-intensive practical applications. On the downside, these content items are very noisy and highly informal, making it difficult to extract sense out of the stream. In this paper, we exploit various approaches to detect the named entities and significant micro-events from users’ tweets during a live sports event. Here we describe how combining linguistic features with background knowledge and the use of Twitter-specific features can achieve high, precise detection results (f-measure = 87%) in different datasets. A study was conducted on tweets from cricket matches in the ICC World Cup in order to augment the event-related non-textual media with collective intelligence
Measuring relative opinion from location-based social media: A case study of the 2016 U.S. presidential election
Social media has become an emerging alternative to opinion polls for public
opinion collection, while it is still posing many challenges as a passive data
source, such as structurelessness, quantifiability, and representativeness.
Social media data with geotags provide new opportunities to unveil the
geographic locations of users expressing their opinions. This paper aims to
answer two questions: 1) whether quantifiable measurement of public opinion can
be obtained from social media and 2) whether it can produce better or
complementary measures compared to opinion polls. This research proposes a
novel approach to measure the relative opinion of Twitter users towards public
issues in order to accommodate more complex opinion structures and take
advantage of the geography pertaining to the public issues. To ensure that this
new measure is technically feasible, a modeling framework is developed
including building a training dataset by adopting a state-of-the-art approach
and devising a new deep learning method called Opinion-Oriented Word Embedding.
With a case study of the tweets selected for the 2016 U.S. presidential
election, we demonstrate the predictive superiority of our relative opinion
approach and we show how it can aid visual analytics and support opinion
predictions. Although the relative opinion measure is proved to be more robust
compared to polling, our study also suggests that the former can advantageously
complement the later in opinion prediction
Recommended from our members
Statistical Semantic Classification of Crisis Information
The rise of social media as an information channel during crisis has become key to community response. However, existing crisis awareness applications, often struggle to identify relevant information among the high volume of data that is generated over social platforms. A wide range of statistical features and machine learning methods have been researched in recent years to automatically classify this information. In this paper we aim to complement previous studies by exploring the use of semantics as additional features to identify relevant crisis in- formation. Our assumption is that entities and concepts tend to have a more consistent correlation with relevant and irrelevant information, and therefore can enhance the discrimination power of classifiers. Our results, so far, show that some classification improvements can be obtained when using semantic features, reaching +2.51% when the classifier is applied to a new crisis event (i.e., not in training set)
- …