73,318 research outputs found
Event detection and user interest discovering in social media data streams
Social media plays an increasingly important role in people’s life. Microblogging is a form of social media which allows people to share and disseminate real-life events. Broadcasting events in microblogging networks can be an effective method of creating awareness, divulging important information and so on. However, many existing approaches at dissecting the information content primarily discuss the event detection model and ignore the user interest which can be discovered during event evolution. This leads to difficulty in tracking the most important events as they evolve including identifying the influential spreaders. There is further complication given that the influential spreaders interests will also change during event evolution. The influential spreaders play a key role in event evolution and this has been largely ignored in traditional event detection methods. To this end, we propose a user-interest model based event evolution model, named the HEE (Hot Event Evolution) model. This model not only considers the user interest distribution, but also uses the short text data in the social network to model the posts and the recommend methods to discovering the user interests. This can resolve the problem of data sparsity, as exemplified by many existing event detection methods, and improve the accuracy of event detection. A hot event automatic filtering algorithm is initially applied to remove the influence of general events, improving the quality and efficiency of mining the event. Then an automatic topic clustering algorithm is applied to arrange the short texts into clusters with similar topics. An improved user-interest model is proposed to combine the short texts of each cluster into a long text document simplifying the determination of the overall topic in relation to the interest distribution of each user during the evolution of important events. Finally a novel cosine measure based event similarity detection method is used to assess correlation between events thereby detecting the process of event evolution. The experimental results on a real Twitter dataset demonstrate the efficiency and accuracy of our proposed model for both event detection and user interest discovery during the evolution of hot events.N/
Preparation of Improved Turkish DataSet for Sentiment Analysis in Social Media
A public dataset, with a variety of properties suitable for sentiment
analysis [1], event prediction, trend detection and other text mining
applications, is needed in order to be able to successfully perform analysis
studies. The vast majority of data on social media is text-based and it is not
possible to directly apply machine learning processes into these raw data,
since several different processes are required to prepare the data before the
implementation of the algorithms. For example, different misspellings of same
word enlarge the word vector space unnecessarily, thereby it leads to reduce
the success of the algorithm and increase the computational power requirement.
This paper presents an improved Turkish dataset with an effective spelling
correction algorithm based on Hadoop [2]. The collected data is recorded on the
Hadoop Distributed File System and the text based data is processed by
MapReduce programming model. This method is suitable for the storage and
processing of large sized text based social media data. In this study, movie
reviews have been automatically recorded with Apache ManifoldCF (MCF) [3] and
data clusters have been created. Various methods compared such as Levenshtein
and Fuzzy String Matching have been proposed to create a public dataset from
collected data. Experimental results show that the proposed algorithm, which
can be used as an open source dataset in sentiment analysis studies, have been
performed successfully to the detection and correction of spelling errors.Comment: Presented at CMES201
Image Analysis Enhanced Event Detection from Geo-tagged Tweet Streams
Events detected from social media streams often include early signs of
accidents, crimes or disasters. Therefore, they can be used by related parties
for timely and efficient response. Although significant progress has been made
on event detection from tweet streams, most existing methods have not
considered the posted images in tweets, which provide richer information than
the text, and potentially can be a reliable indicator of whether an event
occurs or not. In this paper, we design an event detection algorithm that
combines textual, statistical and image information, following an unsupervised
machine learning approach. Specifically, the algorithm starts with semantic and
statistical analyses to obtain a list of tweet clusters, each of which
corresponds to an event candidate, and then performs image analysis to separate
events from non-events---a convolutional autoencoder is trained for each
cluster as an anomaly detector, where a part of the images are used as the
training data and the remaining images are used as the test instances. Our
experiments on multiple datasets verify that when an event occurs, the mean
reconstruction errors of the training and test images are much closer, compared
with the case where the candidate is a non-event cluster. Based on this
finding, the algorithm rejects a candidate if the difference is larger than a
threshold. Experimental results over millions of tweets demonstrate that this
image analysis enhanced approach can significantly increase the precision with
minimum impact on the recall.Comment: 12 pages, 4 figure
Real-time Content Identification for Events and Sub-Events from Microblogs.
PhDIn an age when people are predisposed to report real-world events through their social
media accounts, many researchers value the advantages of mining such unstructured
and informal data from social media. Compared with the traditional news media, online
social media services, such as Twitter, can provide more comprehensive and timely
information about real-world events. Existing Twitter event monitoring systems analyse
partial event data and are unable to report the underlying stories or sub-events in realtime.
To ll this gap, this research focuses on the automatic identi cation of content for
events and sub-events through the analysis of Twitter streams in real-time.
To full the need of real-time content identification for events and sub-events, this research
First proposes a novel adaptive crawling model that retrieves extra event content
from the Twitter Streaming API. The proposed model analyses the characteristics of
hashtags and tweets collected from live Twitter streams to automate the expansion of
subsequent queries. By investigating the characteristics of Twitter hashtags, this research
then proposes three Keyword Adaptation Algorithms (KwAAs) which are based
on the term frequency (TF-KwAA), the tra c pattern (TP-KwAA), and the text content
of associated tweets (CS-KwAA) of the emerging hashtags. Based on the comparison
between traditional keyword crawling and adaptive crawling with di erent KwAAs, this
thesis demonstrates that the KwAAs retrieve extra event content about sub-events in
real-time for both planned and unplanned events.
To examine the usefulness of extra event content for the event monitoring system, a
Twitter event monitoring solution is proposed. This \Detection of Sub-events by Twit-
ter Real-time Monitoring (DSTReaM)" framework concurrently runs multiple instances
of a statistical-based event detection algorithm over different stream components. By
evaluating the detection performance using detection accuracy and event entropy, this
research demonstrates that better event detection can be achieved with a broader coverage
of event content.School of Electronic Engineering
Computer Science (EECS), Queen Mary University of London (QMUL)
China Scholarship Council (CSC)
Streaming first story detection with application to Twitter
With the recent rise in popularity and size of social media, there is a growing need for systems that can extract useful information from this amount of data. We address the problem of detecting new events from a stream of Twitter posts. To make event detection feasible on web-scale corpora, we present an algorithm based on locality-sensitive hashing which is able overcome the limitations of traditional approaches, while maintaining competitive results. In particular, a comparison with a stateof-the-art system on the first story detection task shows that we achieve over an order of magnitude speedup in processing time, while retaining comparable performance. Event detection experiments on a collection of 160 million Twitter posts show that celebrity deaths are the fastest spreading news on Twitter.
Crowdsourcing Cybersecurity: Cyber Attack Detection using Social Media
Social media is often viewed as a sensor into various societal events such as
disease outbreaks, protests, and elections. We describe the use of social media
as a crowdsourced sensor to gain insight into ongoing cyber-attacks. Our
approach detects a broad range of cyber-attacks (e.g., distributed denial of
service (DDOS) attacks, data breaches, and account hijacking) in an
unsupervised manner using just a limited fixed set of seed event triggers. A
new query expansion strategy based on convolutional kernels and dependency
parses helps model reporting structure and aids in identifying key event
characteristics. Through a large-scale analysis over Twitter, we demonstrate
that our approach consistently identifies and encodes events, outperforming
existing methods.Comment: 13 single column pages, 5 figures, submitted to KDD 201
Semantics-driven event clustering in Twitter feeds
Detecting events using social media such as Twitter has many useful applications in real-life situations. Many algorithms which all use different information sources - either textual, temporal, geographic or community features - have been developed to achieve this task. Semantic information is often added at the end of the event detection to classify events into semantic topics. But semantic information can also be used to drive the actual event detection, which is less covered by academic research. We therefore supplemented an existing baseline event clustering algorithm with semantic information about the tweets in order to improve its performance. This paper lays out the details of the semantics-driven event clustering algorithms developed, discusses a novel method to aid in the creation of a ground truth for event detection purposes, and analyses how well the algorithms improve over baseline. We find that assigning semantic information to every individual tweet results in just a worse performance in F1 measure compared to baseline. If however semantics are assigned on a coarser, hashtag level the improvement over baseline is substantial and significant in both precision and recall
- …