Search CORE

108 research outputs found

Crowdsourcing Dialect Characterization through Twitter

Author: Bruno Gonçalves
D Mocanu
David Sánchez
DT Pham
J Borge-Holthoefer
M Salathé
M Salathé
PJ Rousseeuw
Tobias Preis
Publication venue: 'Public Library of Science (PLoS)'
Publication date: 26/07/2014
Field of study

We perform a large-scale analysis of language diatopic variation using geotagged microblogging datasets. By collecting all Twitter messages written in Spanish over more than two years, we build a corpus from which a carefully selected list of concepts allows us to characterize Spanish varieties on a global scale. A cluster analysis proves the existence of well defined macroregions sharing common lexical properties. Remarkably enough, we find that Spanish language is split into two superdialects, namely, an urban speech used across major American and Spanish citites and a diverse form that encompasses rural areas and small towns. The latter can be further clustered into smaller varieties with a stronger regional character.Comment: 10 pages, 5 figure

arXiv.org e-Print Archive

Public Library of Science (PLOS)

Crossref

HAL AMU

Directory of Open Access Journals

PubMed Central

Digital.CSIC

Exploiting Text and Network Context for Geolocation of Social Media Users

Author: Baldwin Timothy
Cohn Trevor
Rahimi Afshin
Vu Duy
Publication venue
Publication date: 01/01/2015
Field of study

Research on automatically geolocating social media users has conventionally been based on the text content of posts from a given user or the social network of the user, with very little crossover between the two, and no bench-marking of the two approaches over compara- ble datasets. We bring the two threads of research together in first proposing a text-based method based on adaptive grids, followed by a hybrid network- and text-based method. Evaluating over three Twitter datasets, we show that the empirical difference between text- and network-based methods is not great, and that hybridisation of the two is superior to the component methods, especially in contexts where the user graph is not well connected. We achieve state-of-the-art results on all three datasets

arXiv.org e-Print Archive

University of Queensland eSpace

Jumping Finite Automata for Tweet Comprehension

Author: Ade-Ibijola Abejide
Obare Stephen
Okeyo George
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 15/09/2019
Field of study

Every day, over one billion social media text messages are generated worldwide, which provides abundant information that can lead to improvements in lives of people through evidence-based decision making. Twitter is rich in such data but there are a number of technical challenges in comprehending tweets including ambiguity of the language used in tweets which is exacerbated in under resourced languages. This paper presents an approach based on Jumping Finite Automata for automatic comprehension of tweets. We construct a WordNet for the language of Kenya (WoLK) based on analysis of tweet structure, formalize the space of tweet variation and abstract the space on a Finite Automata. In addition, we present a software tool called Automata-Aided Tweet Comprehension (ATC) tool that takes raw tweets as input, preprocesses, recognise the syntax and extracts semantic information to 86% success rate

Crossref

De Montfort University Open Research Archive

No robust relation between larger cities and depression

Author: Finnemann A.
Huth K.B.S.
Sloot P.M.A.
van den Ende M.W.J.
Publication venue: 'Proceedings of the National Academy of Sciences'
Publication date: 11/01/2022
Field of study

International Migration, Integration and Social Cohesion online publications

No robust relation between larger cities and depression

Author: Finnemann A.
Huth K.B.S.
Sloot P.M.A.
van den Ende M.W.J.
Publication venue: 'Proceedings of the National Academy of Sciences'
Publication date: 11/01/2022
Field of study

International Migration, Integration and Social Cohesion online publications

Geotagging One Hundred Million Twitter Accounts with Total Variation Minimization

Author: Allen David
Compton Ryan
Jurgens David
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2014
Field of study

Geographically annotated social media is extremely valuable for modern information retrieval. However, when researchers can only access publicly-visible data, one quickly finds that social media users rarely publish location information. In this work, we provide a method which can geolocate the overwhelming majority of active Twitter users, independent of their location sharing preferences, using only publicly-visible Twitter data. Our method infers an unknown user's location by examining their friend's locations. We frame the geotagging problem as an optimization over a social network with a total variation-based objective and provide a scalable and distributed algorithm for its solution. Furthermore, we show how a robust estimate of the geographic dispersion of each user's ego network can be used as a per-user accuracy measure which is effective at removing outlying errors. Leave-many-out evaluation shows that our method is able to infer location for 101,846,236 Twitter users at a median error of 6.38 km, allowing us to geotag over 80\% of public tweets.Comment: 9 pages, 8 figures, accepted to IEEE BigData 2014, Compton, Ryan, David Jurgens, and David Allen. "Geotagging one hundred million twitter accounts with total variation minimization." Big Data (Big Data), 2014 IEEE International Conference on. IEEE, 201

arXiv.org e-Print Archive

CiteSeerX

Crossref