3,575 research outputs found
Accurate Local Estimation of Geo-Coordinates for Social Media Posts
Associating geo-coordinates with the content of social media posts can
enhance many existing applications and services and enable a host of new ones.
Unfortunately, a majority of social media posts are not tagged with
geo-coordinates. Even when location data is available, it may be inaccurate,
very broad or sometimes fictitious. Contemporary location estimation approaches
based on analyzing the content of these posts can identify only broad areas
such as a city, which limits their usefulness. To address these shortcomings,
this paper proposes a methodology to narrowly estimate the geo-coordinates of
social media posts with high accuracy. The methodology relies solely on the
content of these posts and prior knowledge of the wide geographical region from
where the posts originate. An ensemble of language models, which are smoothed
over non-overlapping sub-regions of a wider region, lie at the heart of the
methodology. Experimental evaluation using a corpus of over half a million
tweets from New York City shows that the approach, on an average, estimates
locations of tweets to within just 2.15km of their actual positions.Comment: In Proceedings of the 26th International Conference on Software
Engineering and Knowledge Engineering, pp. 642 - 647, 201
A Survey of Location Prediction on Twitter
Locations, e.g., countries, states, cities, and point-of-interests, are
central to news, emergency events, and people's daily lives. Automatic
identification of locations associated with or mentioned in documents has been
explored for decades. As one of the most popular online social network
platforms, Twitter has attracted a large number of users who send millions of
tweets on daily basis. Due to the world-wide coverage of its users and
real-time freshness of tweets, location prediction on Twitter has gained
significant attention in recent years. Research efforts are spent on dealing
with new challenges and opportunities brought by the noisy, short, and
context-rich nature of tweets. In this survey, we aim at offering an overall
picture of location prediction on Twitter. Specifically, we concentrate on the
prediction of user home locations, tweet locations, and mentioned locations. We
first define the three tasks and review the evaluation metrics. By summarizing
Twitter network, tweet content, and tweet context as potential inputs, we then
structurally highlight how the problems depend on these inputs. Each dependency
is illustrated by a comprehensive review of the corresponding strategies
adopted in state-of-the-art approaches. In addition, we also briefly review two
related problems, i.e., semantic location prediction and point-of-interest
recommendation. Finally, we list future research directions.Comment: Accepted to TKDE. 30 pages, 1 figur
Home Location Estimation Using Weather Observation Data
We can extract useful information from social media data by adding the user's
home location. However, since the user's home location is generally not
publicly available, many researchers have been attempting to develop a more
accurate home location estimation. In this study, we propose a method to
estimate a Twitter user's home location by using weather observation data from
AMeDAS. In our method, we first estimate the weather of the area posted by an
estimation target user by using the tweet, Next, we check out the estimated
weather against weather observation data, and narrow down the area posted by
the user. Finally, the user's home location is estimated as which areas the
user frequently posts from. In our experiments, the results indicate that our
method functions effectively and also demonstrate that accuracy improves under
certain conditions.Comment: The 2017 International Conference On Advanced Informatics: Concepts,
Theory And Application (ICAICTA2017
Exploiting Text and Network Context for Geolocation of Social Media Users
Research on automatically geolocating social media users has conventionally
been based on the text content of posts from a given user or the social network
of the user, with very little crossover between the two, and no bench-marking
of the two approaches over compara- ble datasets. We bring the two threads of
research together in first proposing a text-based method based on adaptive
grids, followed by a hybrid network- and text-based method. Evaluating over
three Twitter datasets, we show that the empirical difference between text- and
network-based methods is not great, and that hybridisation of the two is
superior to the component methods, especially in contexts where the user graph
is not well connected. We achieve state-of-the-art results on all three
datasets
Exploring Social Media for Event Attendance
Large popular events are nowadays well reflected in social media fora (e.g. Twitter), where people discuss their interest in participating in the events. In this paper we propose to exploit the content of non-geotagged posts in social media to build machine-learned classifiers able to infer users' attendance of large events in three temporal periods: before, during and after an event. The categories of features used to train the classifier reflect four different dimensions of social media: textual, temporal, social, and multimedia content. We detail the approach followed to design the feature space and report on experiments conducted on two large music festivals in the UK, namely the VFestival and Creamfields events. Our attendance classifier attains very high accuracy with the highest result observed for the Creamfields dataset ~87% accuracy to classify users that will participate in the event
Geolocation Predicting of Tweets Using BERT-Based Models
This research is aimed to solve the tweet/user geolocation prediction task
and provide a flexible methodology for the geotagging of textual big data. The
suggested approach implements neural networks for natural language processing
(NLP) to estimate the location as coordinate pairs (longitude, latitude) and
two-dimensional Gaussian Mixture Models (GMMs). The scope of proposed models
has been finetuned on a Twitter dataset using pretrained Bidirectional Encoder
Representations from Transformers (BERT) as base models. Performance metrics
show a median error of fewer than 30 km on a worldwide-level, and fewer than 15
km on the US-level datasets for the models trained and evaluated on text
features of tweets' content and metadata context.Comment: 27 pages, 6 tables, 7 figure
- …