2,262 research outputs found
A Survey of Location Prediction on Twitter
Locations, e.g., countries, states, cities, and point-of-interests, are
central to news, emergency events, and people's daily lives. Automatic
identification of locations associated with or mentioned in documents has been
explored for decades. As one of the most popular online social network
platforms, Twitter has attracted a large number of users who send millions of
tweets on daily basis. Due to the world-wide coverage of its users and
real-time freshness of tweets, location prediction on Twitter has gained
significant attention in recent years. Research efforts are spent on dealing
with new challenges and opportunities brought by the noisy, short, and
context-rich nature of tweets. In this survey, we aim at offering an overall
picture of location prediction on Twitter. Specifically, we concentrate on the
prediction of user home locations, tweet locations, and mentioned locations. We
first define the three tasks and review the evaluation metrics. By summarizing
Twitter network, tweet content, and tweet context as potential inputs, we then
structurally highlight how the problems depend on these inputs. Each dependency
is illustrated by a comprehensive review of the corresponding strategies
adopted in state-of-the-art approaches. In addition, we also briefly review two
related problems, i.e., semantic location prediction and point-of-interest
recommendation. Finally, we list future research directions.Comment: Accepted to TKDE. 30 pages, 1 figur
On the Accuracy of Hyper-local Geotagging of Social Media Content
Social media users share billions of items per year, only a small fraction of
which is geotagged. We present a data- driven approach for identifying
non-geotagged content items that can be associated with a hyper-local
geographic area by modeling the location distributions of hyper-local n-grams
that appear in the text. We explore the trade-off between accuracy, precision
and coverage of this method. Further, we explore differences across content
received from multiple platforms and devices, and show, for example, that
content shared via different sources and applications produces significantly
different geographic distributions, and that it is best to model and predict
location for items according to their source. Our findings show the potential
and the bounds of a data-driven approach to geotag short social media texts,
and offer implications for all applications that use data-driven approaches to
locate content.Comment: 10 page
Semantics-driven event clustering in Twitter feeds
Detecting events using social media such as Twitter has many useful applications in real-life situations. Many algorithms which all use different information sources - either textual, temporal, geographic or community features - have been developed to achieve this task. Semantic information is often added at the end of the event detection to classify events into semantic topics. But semantic information can also be used to drive the actual event detection, which is less covered by academic research. We therefore supplemented an existing baseline event clustering algorithm with semantic information about the tweets in order to improve its performance. This paper lays out the details of the semantics-driven event clustering algorithms developed, discusses a novel method to aid in the creation of a ground truth for event detection purposes, and analyses how well the algorithms improve over baseline. We find that assigning semantic information to every individual tweet results in just a worse performance in F1 measure compared to baseline. If however semantics are assigned on a coarser, hashtag level the improvement over baseline is substantial and significant in both precision and recall
Tweets and Facebook posts, the novelty techniques in the creation of origin-destination models
Abstract: Social media and big data have emerged to be a useful source of information that can be used for planning purposes, particularly transportation planning and trip-distribution studies. Cities in developing countries such as South Africa often struggle with out-dated, unreliable and cumbersome techniques such as traffic counts and household surveys to conduct origin and destination studies. The emergence of ubiquitous crowd sourced data, big data, social media and geolocation based services has shown huge potential in providing useful information for origin and destination studies. Perhaps such information can be utilised to determine the origin and destination of commuters using the Gautrain, a high-speed railway in Gauteng province South Africa. To date little is known about the origins and destinations of Gautrain commuters. Accordingly, this study assesses the viability of using geolocation-based services namely Facebook and Twitter in mapping out the network movements of Gautrain commuters. Explorative Spatial Data Analysis (ESDA), Echo-social and ArcGis software were used to extract social media data, i.e. tweets and Facebook posts as well as to visualize the concentration of Gautrain commuters. The results demonstrate that big data and geolocation based services have the significant potential to predict movement network patterns of commuters and this information can thus, be used to inform and improve transportation planning. Nevertheless use of crowd sourced data and big data has privacy concerns that still need to be addressed
- …