68 research outputs found
Geotagging One Hundred Million Twitter Accounts with Total Variation Minimization
Geographically annotated social media is extremely valuable for modern
information retrieval. However, when researchers can only access
publicly-visible data, one quickly finds that social media users rarely publish
location information. In this work, we provide a method which can geolocate the
overwhelming majority of active Twitter users, independent of their location
sharing preferences, using only publicly-visible Twitter data.
Our method infers an unknown user's location by examining their friend's
locations. We frame the geotagging problem as an optimization over a social
network with a total variation-based objective and provide a scalable and
distributed algorithm for its solution. Furthermore, we show how a robust
estimate of the geographic dispersion of each user's ego network can be used as
a per-user accuracy measure which is effective at removing outlying errors.
Leave-many-out evaluation shows that our method is able to infer location for
101,846,236 Twitter users at a median error of 6.38 km, allowing us to geotag
over 80\% of public tweets.Comment: 9 pages, 8 figures, accepted to IEEE BigData 2014, Compton, Ryan,
David Jurgens, and David Allen. "Geotagging one hundred million twitter
accounts with total variation minimization." Big Data (Big Data), 2014 IEEE
International Conference on. IEEE, 201
On the Accuracy of Hyper-local Geotagging of Social Media Content
Social media users share billions of items per year, only a small fraction of
which is geotagged. We present a data- driven approach for identifying
non-geotagged content items that can be associated with a hyper-local
geographic area by modeling the location distributions of hyper-local n-grams
that appear in the text. We explore the trade-off between accuracy, precision
and coverage of this method. Further, we explore differences across content
received from multiple platforms and devices, and show, for example, that
content shared via different sources and applications produces significantly
different geographic distributions, and that it is best to model and predict
location for items according to their source. Our findings show the potential
and the bounds of a data-driven approach to geotag short social media texts,
and offer implications for all applications that use data-driven approaches to
locate content.Comment: 10 page
Social Sensing of Floods in the UK
"Social sensing" is a form of crowd-sourcing that involves systematic
analysis of digital communications to detect real-world events. Here we
consider the use of social sensing for observing natural hazards. In
particular, we present a case study that uses data from a popular social media
platform (Twitter) to detect and locate flood events in the UK. In order to
improve data quality we apply a number of filters (timezone, simple text
filters and a naive Bayes `relevance' filter) to the data. We then use place
names in the user profile and message text to infer the location of the tweets.
These two steps remove most of the irrelevant tweets and yield orders of
magnitude more located tweets than we have by relying on geo-tagged data. We
demonstrate that high resolution social sensing of floods is feasible and we
can produce high-quality historical and real-time maps of floods using Twitter.Comment: 24 pages, 6 figure
A Survey of Location Prediction on Twitter
Locations, e.g., countries, states, cities, and point-of-interests, are
central to news, emergency events, and people's daily lives. Automatic
identification of locations associated with or mentioned in documents has been
explored for decades. As one of the most popular online social network
platforms, Twitter has attracted a large number of users who send millions of
tweets on daily basis. Due to the world-wide coverage of its users and
real-time freshness of tweets, location prediction on Twitter has gained
significant attention in recent years. Research efforts are spent on dealing
with new challenges and opportunities brought by the noisy, short, and
context-rich nature of tweets. In this survey, we aim at offering an overall
picture of location prediction on Twitter. Specifically, we concentrate on the
prediction of user home locations, tweet locations, and mentioned locations. We
first define the three tasks and review the evaluation metrics. By summarizing
Twitter network, tweet content, and tweet context as potential inputs, we then
structurally highlight how the problems depend on these inputs. Each dependency
is illustrated by a comprehensive review of the corresponding strategies
adopted in state-of-the-art approaches. In addition, we also briefly review two
related problems, i.e., semantic location prediction and point-of-interest
recommendation. Finally, we list future research directions.Comment: Accepted to TKDE. 30 pages, 1 figur
Implementation of Classification of Geolocation of Country from Worldwide Tweets
Social media are progressively being employed within the scientific community as key supply of knowledge to assist perceive various natural and social phenomena, and this has prompted the event of a good vary of process data processing tools that may extract data from social media for each post-hoc and real time analysis. The rise of interest in mistreatment social media as a supply for analysis has actuated braving the challenge of mechanically geo-locating tweets, given the dearth of specific location data within the majority of tweets. In distinction to abundant previous work that has targeted on location classification of tweets restricted to a selected country, here we tend to undertake the task during a broader context by classifying international tweets at the country level that is up to now undiscovered during a time period situation. We tend to analyze the extent to that a tweet’s country of origin maybe determined by creating use of eight tweet-inherent options for classification
- …