4,171 research outputs found
A Survey of Location Prediction on Twitter
Locations, e.g., countries, states, cities, and point-of-interests, are
central to news, emergency events, and people's daily lives. Automatic
identification of locations associated with or mentioned in documents has been
explored for decades. As one of the most popular online social network
platforms, Twitter has attracted a large number of users who send millions of
tweets on daily basis. Due to the world-wide coverage of its users and
real-time freshness of tweets, location prediction on Twitter has gained
significant attention in recent years. Research efforts are spent on dealing
with new challenges and opportunities brought by the noisy, short, and
context-rich nature of tweets. In this survey, we aim at offering an overall
picture of location prediction on Twitter. Specifically, we concentrate on the
prediction of user home locations, tweet locations, and mentioned locations. We
first define the three tasks and review the evaluation metrics. By summarizing
Twitter network, tweet content, and tweet context as potential inputs, we then
structurally highlight how the problems depend on these inputs. Each dependency
is illustrated by a comprehensive review of the corresponding strategies
adopted in state-of-the-art approaches. In addition, we also briefly review two
related problems, i.e., semantic location prediction and point-of-interest
recommendation. Finally, we list future research directions.Comment: Accepted to TKDE. 30 pages, 1 figur
Continuous Representation of Location for Geolocation and Lexical Dialectology using Mixture Density Networks
We propose a method for embedding two-dimensional locations in a continuous
vector space using a neural network-based model incorporating mixtures of
Gaussian distributions, presenting two model variants for text-based
geolocation and lexical dialectology. Evaluated over Twitter data, the proposed
model outperforms conventional regression-based geolocation and provides a
better estimate of uncertainty. We also show the effectiveness of the
representation for predicting words from location in lexical dialectology, and
evaluate it using the DARE dataset.Comment: Conference on Empirical Methods in Natural Language Processing (EMNLP
2017) September 2017, Copenhagen, Denmar
Exploiting Text and Network Context for Geolocation of Social Media Users
Research on automatically geolocating social media users has conventionally
been based on the text content of posts from a given user or the social network
of the user, with very little crossover between the two, and no bench-marking
of the two approaches over compara- ble datasets. We bring the two threads of
research together in first proposing a text-based method based on adaptive
grids, followed by a hybrid network- and text-based method. Evaluating over
three Twitter datasets, we show that the empirical difference between text- and
network-based methods is not great, and that hybridisation of the two is
superior to the component methods, especially in contexts where the user graph
is not well connected. We achieve state-of-the-art results on all three
datasets
Confounds and Consequences in Geotagged Twitter Data
Twitter is often used in quantitative studies that identify
geographically-preferred topics, writing styles, and entities. These studies
rely on either GPS coordinates attached to individual messages, or on the
user-supplied location field in each profile. In this paper, we compare these
data acquisition techniques and quantify the biases that they introduce; we
also measure their effects on linguistic analysis and text-based geolocation.
GPS-tagging and self-reported locations yield measurably different corpora, and
these linguistic differences are partially attributable to differences in
dataset composition by age and gender. Using a latent variable model to induce
age and gender, we show how these demographic variables interact with geography
to affect language use. We also show that the accuracy of text-based
geolocation varies with population demographics, giving the best results for
men above the age of 40.Comment: final version for EMNLP 201
Traffic event detection framework using social media
This is an accepted manuscript of an article published by IEEE in 2017 IEEE International Conference on Smart Grid and Smart Cities (ICSGSC) on 18/09/2017, available online: https://ieeexplore.ieee.org/document/8038595
The accepted version of the publication may differ from the final published version.© 2017 IEEE. Traffic incidents are one of the leading causes of non-recurrent traffic congestions. By detecting these incidents on time, traffic management agencies can activate strategies to ease congestion and travelers can plan their trip by taking into consideration these factors. In recent years, there has been an increasing interest in Twitter because of the real-time nature of its data. Twitter has been used as a way of predicting revenues, accidents, natural disasters, and traffic. This paper proposes a framework for the real-time detection of traffic events using Twitter data. The methodology consists of a text classification algorithm to identify traffic related tweets. These traffic messages are then geolocated and further classified into positive, negative, or neutral class using sentiment analysis. In addition, stress and relaxation strength detection is performed, with the purpose of further analyzing user emotions within the tweet. Future work will be carried out to implement the proposed framework in the West Midlands area, United Kingdom.Published versio
- …