13,093 research outputs found
Is That Twitter Hashtag Worth Reading
Online social media such as Twitter, Facebook, Wikis and Linkedin have made a
great impact on the way we consume information in our day to day life. Now it
has become increasingly important that we come across appropriate content from
the social media to avoid information explosion. In case of Twitter, popular
information can be tracked using hashtags. Studying the characteristics of
tweets containing hashtags becomes important for a number of tasks, such as
breaking news detection, personalized message recommendation, friends
recommendation, and sentiment analysis among others.
In this paper, we have analyzed Twitter data based on trending hashtags,
which is widely used nowadays. We have used event based hashtags to know users'
thoughts on those events and to decide whether the rest of the users might find
it interesting or not. We have used topic modeling, which reveals the hidden
thematic structure of the documents (tweets in this case) in addition to
sentiment analysis in exploring and summarizing the content of the documents. A
technique to find the interestingness of event based twitter hashtag and the
associated sentiment has been proposed. The proposed technique helps twitter
follower to read, relevant and interesting hashtag.Comment: 10 pages, 6 figures, Presented at the Third International Symposium
on Women in Computing and Informatics (WCI-2015
Mining Twitter for crisis management: realtime floods detection in the Arabian Peninsula
A thesis submitted to the University of Bedfordshire, in partial fulfilment of the requirements for the degree of doctor of Philosophy.In recent years, large amounts of data have been made available on microblog platforms such as Twitter, however, it is difficult to filter and extract information and knowledge from such data because of the high volume, including noisy data. On Twitter, the general public are able to report real-world events such as floods in real time, and act as social sensors. Consequently, it is beneficial to have a method that can detect flood events automatically in real time to help governmental authorities, such as crisis management authorities, to detect the event and make decisions during the early stages of the event.
This thesis proposes a real time flood detection system by mining Arabic Tweets using machine learning and data mining techniques. The proposed system comprises five main components: data collection, pre-processing, flooding event extract, location inferring, location named entity link, and flooding event visualisation. An effective method of flood detection from Arabic tweets is presented and evaluated by using supervised learning techniques. Furthermore, this work presents a location named entity inferring method based on the Learning to Search method, the results show that the proposed method outperformed the existing systems with significantly higher accuracy in tasks of inferring flood locations from tweets which are written in colloquial Arabic. For the location named entity link, a method has been designed by utilising Google API services as a knowledge base to extract accurate geocode coordinates that are associated with location named entities mentioned in tweets. The results show that the proposed location link method locate 56.8% of tweets with a distance range of 0 – 10 km from the actual location. Further analysis has shown that the accuracy in locating tweets in an actual city and region are 78.9% and 84.2% respectively
Suspended accounts: A source of Tweets with disgust and anger emotions for augmenting hate speech data sample
In this paper we present a proposal to address the problem of the pricey and unreliable human annotation, which is important for detection of hate speech from the web contents. In particular, we propose to use the text that are produced from the suspended accounts in the aftermath of a hateful event as subtle and reliable source for hate speech prediction. The proposal was motivated after implementing emotion analysis on three sources of data sets: suspended, active and neutral ones, i.e. the first two sources of data sets contain hateful tweets from suspended accounts and active accounts, respectively, whereas the third source of data sets contain neutral tweets only. The emotion analysis indicated that the tweets from suspended accounts show more disgust, negative, fear and sadness emotions than the ones from active accounts, although tweets from both types of accounts might be annotated as hateful ones by human annotators. We train two Random Forest classifiers based on the semantic meaning of tweets respectively from suspended and active accounts, and evaluate the prediction accuracy of the two classifiers on unseen data. The results show that the classifier trained on the tweets from suspended accounts outperformed the one trained on the tweets from active accounts by 16% of overall F-score
Deception Detection with Feature-Augmentation by soft Domain Transfer
In this era of information explosion, deceivers use different domains or
mediums of information to exploit the users, such as News, Emails, and Tweets.
Although numerous research has been done to detect deception in all these
domains, information shortage in a new event necessitates these domains to
associate with each other to battle deception. To form this association, we
propose a feature augmentation method by harnessing the intermediate layer
representation of neural models. Our approaches provide an improvement over the
self-domain baseline models by up to 6.60%. We find Tweets to be the most
helpful information provider for Fake News and Phishing Email detection,
whereas News helps most in Tweet Rumor detection. Our analysis provides a
useful insight for domain knowledge transfer which can help build a stronger
deception detection system than the existing literature
Image Analysis Enhanced Event Detection from Geo-tagged Tweet Streams
Events detected from social media streams often include early signs of
accidents, crimes or disasters. Therefore, they can be used by related parties
for timely and efficient response. Although significant progress has been made
on event detection from tweet streams, most existing methods have not
considered the posted images in tweets, which provide richer information than
the text, and potentially can be a reliable indicator of whether an event
occurs or not. In this paper, we design an event detection algorithm that
combines textual, statistical and image information, following an unsupervised
machine learning approach. Specifically, the algorithm starts with semantic and
statistical analyses to obtain a list of tweet clusters, each of which
corresponds to an event candidate, and then performs image analysis to separate
events from non-events---a convolutional autoencoder is trained for each
cluster as an anomaly detector, where a part of the images are used as the
training data and the remaining images are used as the test instances. Our
experiments on multiple datasets verify that when an event occurs, the mean
reconstruction errors of the training and test images are much closer, compared
with the case where the candidate is a non-event cluster. Based on this
finding, the algorithm rejects a candidate if the difference is larger than a
threshold. Experimental results over millions of tweets demonstrate that this
image analysis enhanced approach can significantly increase the precision with
minimum impact on the recall.Comment: 12 pages, 4 figure
Event Detection on Twitter
Detecting events by using social media has been an active research problem.
In this work, we investigate and compare the performance of two methods for
event detection in Twitter by using Apache Storm as the stream processing
infrastructure. The first event detection method is based on identifying
uncommonly common words inside tweet blocks, and the second one is based on
clustering tweets to detect a cluster as an event. Each of the methods has its
own characteristics. Uncommonly common word based method relies on the burst of
words and hence is not affected from concurrency problems in distributed
environment. On the other hand, clustering based method includes a finer
grained analysis, but it is sensitive to the concurrent processing. We
investigate the effect of stream processing and concurrency handling support
provided by Apace Storm on event detection by these methods
The Early Bird Catches The Term: Combining Twitter and News Data For Event Detection and Situational Awareness
Twitter updates now represent an enormous stream of information originating
from a wide variety of formal and informal sources, much of which is relevant
to real-world events. In this paper we adapt existing bio-surveillance
algorithms to detect localised spikes in Twitter activity corresponding to real
events with a high level of confidence. We then develop a methodology to
automatically summarise these events, both by providing the tweets which fully
describe the event and by linking to highly relevant news articles. We apply
our methods to outbreaks of illness and events strongly affecting sentiment. In
both case studies we are able to detect events verifiable by third party
sources and produce high quality summaries
- …