Search CORE

808 research outputs found

Inferring the Origin Locations of Tweets with Quantitative Confidence

Author: Brown L. D.
Eisenstein J.
J.
Mahmud J.
McClendon S.
McLachlan G.
Neal R. M.
Paradesi S.
Roller S.
Schulz A.
Wing B.
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 15/11/2013
Field of study

Social Internet content plays an increasingly critical role in many domains, including public health, disaster management, and politics. However, its utility is limited by missing geographic information; for example, fewer than 1.6% of Twitter messages (tweets) contain a geotag. We propose a scalable, content-based approach to estimate the location of tweets using a novel yet simple variant of gaussian mixture models. Further, because real-world applications depend on quantified uncertainty for such estimates, we propose novel metrics of accuracy, precision, and calibration, and we evaluate our approach accordingly. Experiments on 13 million global, comprehensively multi-lingual tweets show that our approach yields reliable, well-calibrated results competitive with previous computationally intensive methods. We also show that a relatively small number of training data are required for good estimates (roughly 30,000 tweets) and models are quite time-invariant (effective on tweets many weeks newer than the training set). Finally, we show that toponyms and languages with small geographic footprint provide the most useful location signals.Comment: 14 pages, 6 figures. Version 2: Move mathematics to appendix, 2 new references, various other presentation improvements. Version 3: Various presentation improvements, accepted at ACM CSCW 201

arXiv.org e-Print Archive

CiteSeerX

Crossref

Determine the User Country of a Tweet

Author: Broek Tijs van den
Ehrenhard Michel
Hiemstra Djoerd
Need Ariana
van der Veen Han
Publication venue
Publication date: 01/01/2015
Field of study

In the widely used message platform Twitter, about 2% of the tweets contains the geographical location through exact GPS coordinates (latitude and longitude). Knowing the location of a tweet is useful for many data analytics questions. This research is looking at the determination of a location for tweets that do not contain GPS coordinates. An accuracy of 82% was achieved using a Naive Bayes model trained on features such as the users' timezone, the user's language, and the parsed user location. The classifier performs well on active Twitter countries such as the Netherlands and United Kingdom. An analysis of errors made by the classifier shows that mistakes were made due to limited information and shared properties between countries such as shared timezone. A feature analysis was performed in order to see the effect of different features. The features timezone and parsed user location were the most informative features.Comment: CTIT Technical Report, University of Twent

arXiv.org e-Print Archive

Radboud Repository

University of Twente Research Information

A Survey of Location Prediction on Twitter

Author: Han Jialong
Sun Aixin
Zheng Xin
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2018
Field of study

Locations, e.g., countries, states, cities, and point-of-interests, are central to news, emergency events, and people's daily lives. Automatic identification of locations associated with or mentioned in documents has been explored for decades. As one of the most popular online social network platforms, Twitter has attracted a large number of users who send millions of tweets on daily basis. Due to the world-wide coverage of its users and real-time freshness of tweets, location prediction on Twitter has gained significant attention in recent years. Research efforts are spent on dealing with new challenges and opportunities brought by the noisy, short, and context-rich nature of tweets. In this survey, we aim at offering an overall picture of location prediction on Twitter. Specifically, we concentrate on the prediction of user home locations, tweet locations, and mentioned locations. We first define the three tasks and review the evaluation metrics. By summarizing Twitter network, tweet content, and tweet context as potential inputs, we then structurally highlight how the problems depend on these inputs. Each dependency is illustrated by a comprehensive review of the corresponding strategies adopted in state-of-the-art approaches. In addition, we also briefly review two related problems, i.e., semantic location prediction and point-of-interest recommendation. Finally, we list future research directions.Comment: Accepted to TKDE. 30 pages, 1 figur

arXiv.org e-Print Archive

DR-NTU (Digital Repository of NTU)

On the Accuracy of Hyper-local Geotagging of Social Media Content

Author: Flatow David
Kanza Yaron
Naaman Mor
Volkovich Yana
Xie Ke Eddie
Publication venue
Publication date: 01/02/2015
Field of study

Social media users share billions of items per year, only a small fraction of which is geotagged. We present a data- driven approach for identifying non-geotagged content items that can be associated with a hyper-local geographic area by modeling the location distributions of hyper-local n-grams that appear in the text. We explore the trade-off between accuracy, precision and coverage of this method. Further, we explore differences across content received from multiple platforms and devices, and show, for example, that content shared via different sources and applications produces significantly different geographic distributions, and that it is best to model and predict location for items according to their source. Our findings show the potential and the bounds of a data-driven approach to geotag short social media texts, and offer implications for all applications that use data-driven approaches to locate content.Comment: 10 page

arXiv.org e-Print Archive

CiteSeerX

Immigrant community integration in world cities

Author: A Ager
A Amini
B Gonçalves
B Hawelka
Bruno Gonçalves
C Hamnett
D Arribas-Bel
D Butler
D Mocanu
DS Massey
DS Massey
E Bokányi
EW Burgess
Fabio Lamanna
FD Bean
GJ Abel
Gustavo Romanillos
H Dijstelbloem
H Entzinger
J Beaverstock
J Friedmann
J Reades
J Reades
José J. Ramasco
JW Berry
K Phalet
L Anselin
L Sloan
M Batty
M Lenormand
M Lenormand
M Lenormand
M Lenormand
M Oka
M Samers
M Tizzoni
María Henar Salas-Olmedo
Maxime Lenormand
MC González
MJ White
MM Gordon
P Bajardi
P Deville
R Jurdak
Renaud Lambiotte
S Grauwin
S Musterd
S Ronen
S Sassen
T Gonul
T Louail
T Louail
T Pei
V Blondel
Publication venue: 'Public Library of Science (PLoS)'
Publication date: 01/01/2018
Field of study

As a consequence of the accelerated globalization process, today major cities all over the world are characterized by an increasing multiculturalism. The integration of immigrant communities may be affected by social polarization and spatial segregation. How are these dynamics evolving over time? To what extent the different policies launched to tackle these problems are working? These are critical questions traditionally addressed by studies based on surveys and census data. Such sources are safe to avoid spurious biases, but the data collection becomes an intensive and rather expensive work. Here, we conduct a comprehensive study on immigrant integration in 53 world cities by introducing an innovative approach: an analysis of the spatio-temporal communication patterns of immigrant and local communities based on language detection in Twitter and on novel metrics of spatial integration. We quantify the "Power of Integration" of cities --their capacity to spatially integrate diverse cultures-- and characterize the relations between different cultures when acting as hosts or immigrants.Comment: 13 pages, 5 figures + Appendi

arXiv.org e-Print Archive

Crossref

Directory of Open Access Journals

HAL Descartes

Digital.CSIC

FigShare

Continuous Representation of Location for Geolocation and Lexical Dialectology using Mixture Density Networks

Author: Baldwin Timothy
Cohn Trevor
Rahimi Afshin
Publication venue
Publication date: 01/01/2017
Field of study

We propose a method for embedding two-dimensional locations in a continuous vector space using a neural network-based model incorporating mixtures of Gaussian distributions, presenting two model variants for text-based geolocation and lexical dialectology. Evaluated over Twitter data, the proposed model outperforms conventional regression-based geolocation and provides a better estimate of uncertainty. We also show the effectiveness of the representation for predicting words from location in lexical dialectology, and evaluate it using the DARE dataset.Comment: Conference on Empirical Methods in Natural Language Processing (EMNLP 2017) September 2017, Copenhagen, Denmar

arXiv.org e-Print Archive

Crossref

University of Queensland eSpace

Mapping auroral activity with Twitter

Author: A. H. Tapia
E. A. MacDonald
Earle P. S.
M. Heavner
N. A. Case
N. Lalone
Priedhorsky R.
Sugiura M.
Sutton J.
Tapia A. H.
Publication venue: 'Wiley'
Publication date: 04/05/2015
Field of study

Twitter is a popular, publicly-accessible, social media service that has proven useful in mapping large-scale events in real-time. In this study, for the first time, the use of Twitter as a measure of auroral activity is investigated. Peaks in the number of aurora-related tweets are found to frequently coincide with geomagnetic disturbances (detection rate of 91%). Additionally, the number of daily aurora-related tweets is found to strongly correlate with several auroral strength proxies (ravg ≈ 0.7). An examination is made of the bias for location and time of day within Twitter data, and a first order correction of these effects is presented. Overall, the results suggest that Twitter can provide both specific details about an individual aurora and accurate real-time indication of when, and even from where, an aurora is visible

Crossref

Lancaster E-Prints

Location Inference for Non-geotagged Tweets in User Timelines

Author: Kanhabua Nattiya
Li Pengfei
Lu Hua
Pan Gang
Zhao Sha
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2019
Field of study

VBN