714 research outputs found
Towards Real-Time, Country-Level Location Classification of Worldwide Tweets
In contrast to much previous work that has focused on location classification
of tweets restricted to a specific country, here we undertake the task in a
broader context by classifying global tweets at the country level, which is so
far unexplored in a real-time scenario. We analyse the extent to which a
tweet's country of origin can be determined by making use of eight
tweet-inherent features for classification. Furthermore, we use two datasets,
collected a year apart from each other, to analyse the extent to which a model
trained from historical tweets can still be leveraged for classification of new
tweets. With classification experiments on all 217 countries in our datasets,
as well as on the top 25 countries, we offer some insights into the best use of
tweet-inherent features for an accurate country-level classification of tweets.
We find that the use of a single feature, such as the use of tweet content
alone -- the most widely used feature in previous work -- leaves much to be
desired. Choosing an appropriate combination of both tweet content and metadata
can actually lead to substantial improvements of between 20\% and 50\%. We
observe that tweet content, the user's self-reported location and the user's
real name, all of which are inherent in a tweet and available in a real-time
scenario, are particularly useful to determine the country of origin. We also
experiment on the applicability of a model trained on historical tweets to
classify new tweets, finding that the choice of a particular combination of
features whose utility does not fade over time can actually lead to comparable
performance, avoiding the need to retrain. However, the difficulty of achieving
accurate classification increases slightly for countries with multiple
commonalities, especially for English and Spanish speaking countries.Comment: Accepted for publication in IEEE Transactions on Knowledge and Data
Engineering (IEEE TKDE
A Survey of Location Prediction on Twitter
Locations, e.g., countries, states, cities, and point-of-interests, are
central to news, emergency events, and people's daily lives. Automatic
identification of locations associated with or mentioned in documents has been
explored for decades. As one of the most popular online social network
platforms, Twitter has attracted a large number of users who send millions of
tweets on daily basis. Due to the world-wide coverage of its users and
real-time freshness of tweets, location prediction on Twitter has gained
significant attention in recent years. Research efforts are spent on dealing
with new challenges and opportunities brought by the noisy, short, and
context-rich nature of tweets. In this survey, we aim at offering an overall
picture of location prediction on Twitter. Specifically, we concentrate on the
prediction of user home locations, tweet locations, and mentioned locations. We
first define the three tasks and review the evaluation metrics. By summarizing
Twitter network, tweet content, and tweet context as potential inputs, we then
structurally highlight how the problems depend on these inputs. Each dependency
is illustrated by a comprehensive review of the corresponding strategies
adopted in state-of-the-art approaches. In addition, we also briefly review two
related problems, i.e., semantic location prediction and point-of-interest
recommendation. Finally, we list future research directions.Comment: Accepted to TKDE. 30 pages, 1 figur
Determine the User Country of a Tweet
In the widely used message platform Twitter, about 2% of the tweets contains
the geographical location through exact GPS coordinates (latitude and
longitude). Knowing the location of a tweet is useful for many data analytics
questions. This research is looking at the determination of a location for
tweets that do not contain GPS coordinates. An accuracy of 82% was achieved
using a Naive Bayes model trained on features such as the users' timezone, the
user's language, and the parsed user location. The classifier performs well on
active Twitter countries such as the Netherlands and United Kingdom. An
analysis of errors made by the classifier shows that mistakes were made due to
limited information and shared properties between countries such as shared
timezone. A feature analysis was performed in order to see the effect of
different features. The features timezone and parsed user location were the
most informative features.Comment: CTIT Technical Report, University of Twent
Understanding Citizen Reactions and Ebola-Related Information Propagation on Social Media
In severe outbreaks such as Ebola, bird flu and SARS, people share news, and
their thoughts and responses regarding the outbreaks on social media.
Understanding how people perceive the severe outbreaks, what their responses
are, and what factors affect these responses become important. In this paper,
we conduct a comprehensive study of understanding and mining the spread of
Ebola-related information on social media. In particular, we (i) conduct a
large-scale data-driven analysis of geotagged social media messages to
understand citizen reactions regarding Ebola; (ii) build information
propagation models which measure locality of information; and (iii) analyze
spatial, temporal and social properties of Ebola-related information. Our work
provides new insights into Ebola outbreak by understanding citizen reactions
and topic-based information propagation, as well as providing a foundation for
analysis and response of future public health crises.Comment: 2016 IEEE/ACM International Conference on Advances in Social Networks
Analysis and Mining (ASONAM 2016
- …