Search CORE

2,233 research outputs found

Geo-temporal Twitter demographics

Author: Adnan Muhammad
Longley Paul A
Publication venue: TAYLOR & FRANCIS LTD
Publication date: 01/02/2016
Field of study

This paper seeks and uses highly disaggregate social media sources to characterize Greater London in terms of flows of people with modelled individual characteristics, as well as conventional measures of land use morphology and night-time residence. We conduct three analyses. First, we use the Shannon Entropy measure to characterize the geography of information creation across the city. Second, we create a geo-temporal demographic classification of Twitter users in London. Third, we begin to use Twitter data to characterize the links between different locations across the city. We see all three elements as data rich, highly disaggregate geo-temporal analysis of urban form and function, albeit one that pertains to no clearly defined population. Our conclusions reflect upon this severe shortcoming in analysis using social media data, and its implications for progressing our understanding of socio-spatial distributions within cities

Mining the Demographics of Political Sentiment from Twitter Using Learning from Label Proportions

Author: Ardehaly Ehsan Mohammady
Culotta Aron
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 26/08/2017
Field of study

Opinion mining and demographic attribute inference have many applications in social science. In this paper, we propose models to infer daily joint probabilities of multiple latent attributes from Twitter data, such as political sentiment and demographic attributes. Since it is costly and time-consuming to annotate data for traditional supervised classification, we instead propose scalable Learning from Label Proportions (LLP) models for demographic and opinion inference using U.S. Census, national and state political polls, and Cook partisan voting index as population level data. In LLP classification settings, the training data is divided into a set of unlabeled bags, where only the label distribution in of each bag is known, removing the requirement of instance-level annotations. Our proposed LLP model, Weighted Label Regularization (WLR), provides a scalable generalization of prior work on label regularization to support weights for samples inside bags, which is applicable in this setting where bags are arranged hierarchically (e.g., county-level bags are nested inside of state-level bags). We apply our model to Twitter data collected in the year leading up to the 2016 U.S. presidential election, producing estimates of the relationships among political sentiment and demographics over time and place. We find that our approach closely tracks traditional polling data stratified by demographic category, resulting in error reductions of 28-44% over baseline approaches. We also provide descriptive evaluations showing how the model may be used to estimate interactions among many variables and to identify linguistic temporal variation, capabilities which are typically not feasible using traditional polling methods

arXiv.org e-Print Archive

Geo-located Twitter as the proxy for global mobility patterns

Author: Beinat Euro
Hawelka Bartosz
Kazakopoulos Pavlos
Ratti Carlo
Sitko Izabela
Sobolevsky Stanislav
Publication venue: 'Informa UK Limited'
Publication date: 01/10/2013
Field of study

In the advent of a pervasive presence of location sharing services researchers gained an unprecedented access to the direct records of human activity in space and time. This paper analyses geo-located Twitter messages in order to uncover global patterns of human mobility. Based on a dataset of almost a billion tweets recorded in 2012 we estimate volumes of international travelers in respect to their country of residence. We examine mobility profiles of different nations looking at the characteristics such as mobility rate, radius of gyration, diversity of destinations and a balance of the inflows and outflows. The temporal patterns disclose the universal seasons of increased international mobility and the peculiar national nature of overseen travels. Our analysis of the community structure of the Twitter mobility network, obtained with the iterative network partitioning, reveals spatially cohesive regions that follow the regional division of the world. Finally, we validate our result with the global tourism statistics and mobility models provided by other authors, and argue that Twitter is a viable source to understand and quantify global mobility patterns.Comment: 17 pages, 13 figure

arXiv.org e-Print Archive

Emotions, Demographics and Sociability in Twitter Interactions

Author: Arora Megha
Gallegos Luciano
Garcia David
Kumaraguru Ponnurangam
Lerman Kristina
Publication venue
Publication date: 10/03/2016
Field of study

The social connections people form online affect the quality of information they receive and their online experience. Although a host of socioeconomic and cognitive factors were implicated in the formation of offline social ties, few of them have been empirically validated, particularly in an online setting. In this study, we analyze a large corpus of geo-referenced messages, or tweets, posted by social media users from a major US metropolitan area. We linked these tweets to US Census data through their locations. This allowed us to measure emotions expressed in the tweets posted from an area, the structure of social connections, and also use that area's socioeconomic characteristics in analysis. %We extracted the structure of online social interactions from the people mentioned in tweets from that area. We find that at an aggregate level, places where social media users engage more deeply with less diverse social contacts are those where they express more negative emotions, like sadness and anger. Demographics also has an impact: these places have residents with lower household income and education levels. Conversely, places where people engage less frequently but with diverse contacts have happier, more positive messages posted from them and also have better educated, younger, more affluent residents. Results suggest that cognitive factors and offline characteristics affect the quality of online interactions. Our work highlights the value of linking social media data to traditional data sources, such as US Census, to drive novel analysis of online behavior.Comment: International Conference on the Web and Social Media (ICWSM2016

arXiv.org e-Print Archive

Association for the Advancement of Artificial Intelligence: AAAI Publications

Diffusion of Lexical Change in Social Media

Author: Eisenstein Jacob
O'Connor Brendan
Smith Noah A.
Xing Eric P.
Publication venue: 'Public Library of Science (PLoS)'
Publication date: 19/11/2014
Field of study

Computer-mediated communication is driving fundamental changes in the nature of written language. We investigate these changes by statistical analysis of a dataset comprising 107 million Twitter messages (authored by 2.7 million unique user accounts). Using a latent vector autoregressive model to aggregate across thousands of words, we identify high-level patterns in diffusion of linguistic change over the United States. Our model is robust to unpredictable changes in Twitter's sampling rate, and provides a probabilistic characterization of the relationship of macro-scale linguistic influence to a set of demographic and geographic predictors. The results of this analysis offer support for prior arguments that focus on geographical proximity and population size. However, demographic similarity -- especially with regard to race -- plays an even more central role, as cities with similar racial demographics are far more likely to share linguistic influence. Rather than moving towards a single unified "netspeak" dialect, language evolution in computer-mediated communication reproduces existing fault lines in spoken American English.Comment: preprint of PLOS-ONE paper from November 2014; PLoS ONE 9(11) e11311

arXiv.org e-Print Archive

Directory of Open Access Journals

FigShare