21 research outputs found

    Scaling in Words on Twitter

    Get PDF
    Scaling properties of language are a useful tool for understanding generative processes in texts. We investigate the scaling relations in citywise Twitter corpora coming from the Metropolitan and Micropolitan Statistical Areas of the United States. We observe a slightly superlinear urban scaling with the city population for the total volume of the tweets and words created in a city. We then find that a certain core vocabulary follows the scaling relationship of that of the bulk text, but most words are sensitive to city size, exhibiting a super- or a sublinear urban scaling. For both regimes we can offer a plausible explanation based on the meaning of the words. We also show that the parameters for Zipf's law and Heaps law differ on Twitter from that of other texts, and that the exponent of Zipf's law changes with city size

    Understanding Inequalities in Ride-Hailing Services Through Simulations

    Get PDF
    Despite the potential of online sharing economy platforms such as Uber, Lyft, or Foodora to democratize the labor market, these services are often accused of fostering unfair working conditions and low wages. These problems have been recognized by researchers and regulators but the size and complexity of these socio-technical systems, combined with the lack of transparency about algorithmic practices, makes it difficult to understand system dynamics and large-scale behavior. This paper combines approaches from complex systems and algorithmic fairness to investigate the effect of algorithm design decisions on wage inequality in ride-hailing markets. We first present a computational model that includes conditions about locations of drivers and passengers, traffic, the layout of the city, and the algorithm that matches requests with drivers. We calibrate the model with parameters derived from empirical data. Our simulations show that small changes in the system parameters can cause large deviations in the income distributions of drivers, leading to a highly unpredictable system which often distributes vastly different incomes to identically performing drivers. As suggested by recent studies about feedback loops in algorithmic systems, these initial income differences can result in enforced and long-term wage gaps.Comment: Code for the simulation can be found at https://github.com/bokae/tax

    The anatomy of a population-scale social network

    Full text link
    Large-scale human social network structure is typically inferred from digital trace samples of online social media platforms or mobile communication data. Instead, here we investigate the social network structure of a complete population, where people are connected by high-quality links sourced from administrative registers of family, household, work, school, and next-door neighbors. We examine this multilayer social opportunity structure through three common concepts in network analysis: degree, closure, and distance. Findings present how particular network layers contribute to presumably universal scale-free and small-world properties of networks. Furthermore, we suggest a novel measure of excess closure and apply this in a life-course perspective to show how the social opportunity structure of individuals varies along age, socio-economic status, and education level. Our work provides new entry points to understand individual socio-economic failure and success as well as persistent societal problems of inequality and segregation

    Urban hierarchy and spatial diffusion over the innovation life cycle

    Get PDF
    Successful innovations achieve large geographical coverage by spreading across settlements and distances. For decades, spatial diffusion has been argued to take place along the urban hierarchy such that the innovation first spreads from large to medium cities then later from medium to small cities. Yet, the role of geographical distance, the other major factor of spatial diffusion, was difficult to identify in hierarchical diffusion due to missing data on spreading events. In this paper, we exploit spatial patterns of individual invitations on a social media platform sent from registered users to new users over the entire life cycle of the platform. This enables us to disentangle the role of urban hierarchy and the role of distance by observing the source and target locations of flows over an unprecedented timescale. We demonstrate that hierarchical diffusion greatly overlaps with diffusion to close distances and these factors co-evolve over the life cycle; thus, their joint analysis is necessary. Then, a regression framework is applied to estimate the number of invitations sent between pairs of towns by years in the life cycle with the population sizes of the source and target towns, their combinations, and the distance between them. We confirm that hierarchical diffusion prevails initially across large towns only but emerges in the full spectrum of settlements in the middle of the life cycle when adoption accelerates. Unlike in previous gravity estimations, we find that after an intensifying role of distance in the middle of the life cycle a surprisingly weak distance effect characterizes the last years of diffusion. Our results stress the dominance of urban hierarchy in spatial diffusion and inform future predictions of innovation adoption at local scales

    Race, Religion and the City: Twitter Word Frequency Patterns Reveal Dominant Demographic Dimensions in the United States

    Get PDF
    Recently, numerous approaches have emerged in the social sciences to exploit the opportunities made possible by the vast amounts of data generated by online social networks (OSNs). Having access to information about users on such a scale opens up a range of possibilities, all without the limitations associated with often slow and expensive paper-based polls. A question that remains to be satisfactorily addressed, however, is how demography is represented in the OSN content? Here, we study language use in the US using a corpus of text compiled from over half a billion geo-tagged messages from the online microblogging platform Twitter. Our intention is to reveal the most important spatial patterns in language use in an unsupervised manner and relate them to demographics. Our approach is based on Latent Semantic Analysis (LSA) augmented with the Robust Principal Component Analysis (RPCA) methodology. We find spatially correlated patterns that can be interpreted based on the words associated with them. The main language features can be related to slang use, urbanization, travel, religion and ethnicity, the patterns of which are shown to correlate plausibly with traditional census data. Our findings thus validate the concept of demography being represented in OSN language use and show that the traits observed are inherently present in the word frequencies without any previous assumptions about the dataset. Thus, they could form the basis of further research focusing on the evaluation of demographic data estimation from other big data sources, or on the dynamical processes that result in the patterns found here

    Real-time estimation of the effective reproduction number of COVID-19 from behavioral data

    Get PDF
    Near-real time estimations of the effective reproduction number are among the most important tools to track the progression of a pandemic and to inform policy makers and the general public. However, these estimations rely on reported case numbers, commonly recorded with significant biases. The epidemic outcome is strongly influenced by the dynamics of social contacts, which are neglected in conventional surveillance systems as their real-time observation is challenging. Here, we propose a concept using online and offline behavioral data, recording age-stratified contact matrices at a daily rate. Modeling the epidemic using the reconstructed matrices we dynamically estimate the effective reproduction number during the two first waves of the COVID-19 pandemic in Hungary. Our results demonstrate how behavioral data can be used to build alternative monitoring systems complementing the established public health surveillance. They can identify and provide better signals during periods when official estimates appear unreliable due to observational biases

    Scaling in Words on Twitter

    Get PDF

    The role of geography in the complex diffusion of innovations

    Get PDF
    The urban-rural divide is increasing in modern societies calling for geographical extensions of social influence modelling. Improved understanding of innovation diffusion across locations and through social connections can provide us with new insights into the spread of information, technological progress and economic development. In this work, we analyze the spatial adoption dynamics of iWiW, an Online Social Network (OSN) in Hungary and uncover empirical features about the spatial adoption in social networks. During its entire life cycle from 2002 to 2012, iWiW reached up to 300 million friendship ties of 3 million users. We find that the number of adopters as a function of town population follows a scaling law that reveals a strongly concentrated early adoption in large towns and a less concentrated late adoption. We also discover a strengthening distance decay of spread over the life-cycle indicating high fraction of distant diffusion in early stages but the dominance of local diffusion in late stages. The spreading process is modelled within the Bass diffusion framework that enables us to compare the differential equation version with an agent-based version of the model run on the empirical network. Although both models can capture the macro trend of adoption, they have limited capacity to describe the observed trends of urban scaling and distance decay. We find, however that incorporating adoption thresholds, defined by the fraction of social connections that adopt a technology before the individual adopts, improves the network model fit to the urban scaling of early adopters. Controlling for the threshold distribution enables us to eliminate the bias induced by local network structure on predicting local adoption peaks. Finally, we show that geographical features such as distance from the innovation origin and town size influence prediction of adoption peak at local scales.Comment: 21 pages, 11 figures, 4 table

    Urban hierarchy and spatial diffusion over the innovation life cycle

    Get PDF
    Successful innovations achieve large geographical coverage by spreading across settlements and distances. For decades, spatial diffusion has been argued to take place along the urban hierarchy. Yet, the role of geographical distance was difficult to identify in hierarchical diffusion due to missing data on spreading events. In this paper, we exploit spatial patterns of individual invitations sent from registered users to new users over the entire life cycle of a social media platform. We demonstrate that hierarchical diffusion overlaps with diffusion to close distances and these factors co-evolve over the life cycle. Therefore, we disentangle them in a regression framework that estimates the yearly number of invitations sent between pairs of towns. We confirm that hierarchical diffusion prevails initially across large towns only but emerges in the full spectrum of settlements in the middle of the life cycle when adoption accelerates. Unlike in previous gravity estimations, we find that after an intensifying role of distance in the middle of the life cycle a surprisingly weak distance effect characterizes the last years of diffusion. Our results stress the dominance of urban hierarchy in spatial diffusion and inform future predictions of innovation adoption at local scales
    corecore