2,233 research outputs found
Geo-temporal Twitter demographics
This paper seeks and uses highly disaggregate social media sources to characterize Greater London in terms of flows of people with modelled individual characteristics, as well as conventional measures of land use morphology and night-time residence. We conduct three analyses. First, we use the Shannon Entropy measure to characterize the geography of information creation across the city. Second, we create a geo-temporal demographic classification of Twitter users in London. Third, we begin to use Twitter data to characterize the links between different locations across the city. We see all three elements as data rich, highly disaggregate geo-temporal analysis of urban form and function, albeit one that pertains to no clearly defined population. Our conclusions reflect upon this severe shortcoming in analysis using social media data, and its implications for progressing our understanding of socio-spatial distributions within cities
Mining the Demographics of Political Sentiment from Twitter Using Learning from Label Proportions
Opinion mining and demographic attribute inference have many applications in
social science. In this paper, we propose models to infer daily joint
probabilities of multiple latent attributes from Twitter data, such as
political sentiment and demographic attributes. Since it is costly and
time-consuming to annotate data for traditional supervised classification, we
instead propose scalable Learning from Label Proportions (LLP) models for
demographic and opinion inference using U.S. Census, national and state
political polls, and Cook partisan voting index as population level data. In
LLP classification settings, the training data is divided into a set of
unlabeled bags, where only the label distribution in of each bag is known,
removing the requirement of instance-level annotations. Our proposed LLP model,
Weighted Label Regularization (WLR), provides a scalable generalization of
prior work on label regularization to support weights for samples inside bags,
which is applicable in this setting where bags are arranged hierarchically
(e.g., county-level bags are nested inside of state-level bags). We apply our
model to Twitter data collected in the year leading up to the 2016 U.S.
presidential election, producing estimates of the relationships among political
sentiment and demographics over time and place. We find that our approach
closely tracks traditional polling data stratified by demographic category,
resulting in error reductions of 28-44% over baseline approaches. We also
provide descriptive evaluations showing how the model may be used to estimate
interactions among many variables and to identify linguistic temporal
variation, capabilities which are typically not feasible using traditional
polling methods
Geo-located Twitter as the proxy for global mobility patterns
In the advent of a pervasive presence of location sharing services
researchers gained an unprecedented access to the direct records of human
activity in space and time. This paper analyses geo-located Twitter messages in
order to uncover global patterns of human mobility. Based on a dataset of
almost a billion tweets recorded in 2012 we estimate volumes of international
travelers in respect to their country of residence. We examine mobility
profiles of different nations looking at the characteristics such as mobility
rate, radius of gyration, diversity of destinations and a balance of the
inflows and outflows. The temporal patterns disclose the universal seasons of
increased international mobility and the peculiar national nature of overseen
travels. Our analysis of the community structure of the Twitter mobility
network, obtained with the iterative network partitioning, reveals spatially
cohesive regions that follow the regional division of the world. Finally, we
validate our result with the global tourism statistics and mobility models
provided by other authors, and argue that Twitter is a viable source to
understand and quantify global mobility patterns.Comment: 17 pages, 13 figure
Emotions, Demographics and Sociability in Twitter Interactions
The social connections people form online affect the quality of information
they receive and their online experience. Although a host of socioeconomic and
cognitive factors were implicated in the formation of offline social ties, few
of them have been empirically validated, particularly in an online setting. In
this study, we analyze a large corpus of geo-referenced messages, or tweets,
posted by social media users from a major US metropolitan area. We linked these
tweets to US Census data through their locations. This allowed us to measure
emotions expressed in the tweets posted from an area, the structure of social
connections, and also use that area's socioeconomic characteristics in
analysis. %We extracted the structure of online social interactions from the
people mentioned in tweets from that area. We find that at an aggregate level,
places where social media users engage more deeply with less diverse social
contacts are those where they express more negative emotions, like sadness and
anger. Demographics also has an impact: these places have residents with lower
household income and education levels. Conversely, places where people engage
less frequently but with diverse contacts have happier, more positive messages
posted from them and also have better educated, younger, more affluent
residents. Results suggest that cognitive factors and offline characteristics
affect the quality of online interactions. Our work highlights the value of
linking social media data to traditional data sources, such as US Census, to
drive novel analysis of online behavior.Comment: International Conference on the Web and Social Media (ICWSM2016
Diffusion of Lexical Change in Social Media
Computer-mediated communication is driving fundamental changes in the nature
of written language. We investigate these changes by statistical analysis of a
dataset comprising 107 million Twitter messages (authored by 2.7 million unique
user accounts). Using a latent vector autoregressive model to aggregate across
thousands of words, we identify high-level patterns in diffusion of linguistic
change over the United States. Our model is robust to unpredictable changes in
Twitter's sampling rate, and provides a probabilistic characterization of the
relationship of macro-scale linguistic influence to a set of demographic and
geographic predictors. The results of this analysis offer support for prior
arguments that focus on geographical proximity and population size. However,
demographic similarity -- especially with regard to race -- plays an even more
central role, as cities with similar racial demographics are far more likely to
share linguistic influence. Rather than moving towards a single unified
"netspeak" dialect, language evolution in computer-mediated communication
reproduces existing fault lines in spoken American English.Comment: preprint of PLOS-ONE paper from November 2014; PLoS ONE 9(11) e11311
- …