15 research outputs found

    Comparison of Feature Extraction Methods and Predictors for Income Inference

    Get PDF
    Abstract—Patterns of mobile phone communications, coupled with the information of the social network graph and financial behavior, allow us to make inferences of users’ socio-economic attributes such as their income level. We present here several methods to extract features from mobile phone usage (calls and messages), and compare different combinations of supervised machine learning techniques and sets of features used as input for the inference of users’ income. Our experimental results show that the Bayesian method based on the communication graph outperforms standard machine learning algorithms using nodebased features.Sociedad Argentina de Informática e Investigación Operativa (SADIO

    Comparison of Feature Extraction Methods and Predictors for Income Inference

    Get PDF
    Patterns of mobile phone communications, coupled with the information of the social network graph and financial behavior, allow us to make inferences of users' socio-economic attributes such as their income level. We present here several methods to extract features from mobile phone usage (calls and messages), and compare different combinations of supervised machine learning techniques and sets of features used as input for the inference of users' income. Our experimental results show that the Bayesian method based on the communication graph outperforms standard machine learning algorithms using node-based features.Comment: Argentine Symposium on Big Data (AGRANDA), September 5, 201

    Clapping for carers: reproducing inequality during COVID-19

    Get PDF

    Comparison of Feature Extraction Methods and Predictors for Income Inference

    Get PDF
    Abstract—Patterns of mobile phone communications, coupled with the information of the social network graph and financial behavior, allow us to make inferences of users’ socio-economic attributes such as their income level. We present here several methods to extract features from mobile phone usage (calls and messages), and compare different combinations of supervised machine learning techniques and sets of features used as input for the inference of users’ income. Our experimental results show that the Bayesian method based on the communication graph outperforms standard machine learning algorithms using nodebased features.Sociedad Argentina de Informática e Investigación Operativa (SADIO

    Interpreting wealth distribution via poverty map inference using multimodal data

    Full text link
    Poverty maps are essential tools for governments and NGOs to track socioeconomic changes and adequately allocate infrastructure and services in places in need. Sensor and online crowd-sourced data combined with machine learning methods have provided a recent breakthrough in poverty map inference. However, these methods do not capture local wealth fluctuations, and are not optimized to produce accountable results that guarantee accurate predictions to all sub-populations. Here, we propose a pipeline of machine learning models to infer the mean and standard deviation of wealth across multiple geographically clustered populated places, and illustrate their performance in Sierra Leone and Uganda. These models leverage seven independent and freely available feature sources based on satellite images, and metadata collected via online crowd-sourcing and social media. Our models show that combined metadata features are the best predictors of wealth in rural areas, outperforming image-based models, which are the best for predicting the highest wealth quintiles. Our results recover the local mean and variation of wealth, and correctly capture the positive yet non-monotonous correlation between them. We further demonstrate the capabilities and limitations of model transfer across countries and the effects of data recency and other biases. Our methodology provides open tools to build towards more transparent and interpretable models to help governments and NGOs to make informed decisions based on data availability, urbanization level, and poverty thresholds.Comment: 12 pages. In Proceedings of the ACM Web Conference 2023 (WWW'23

    Structural inequalities emerging from a large wire transfers network

    Get PDF
    We aim to explore the connections between structural network inequalities and bank’s customer spending behaviours, within an entire national ecosystem made of natural persons (i.e., an individual human being) and legal entities (i.e., private or public organisations), different business sectors, and supply chains that span distinct geographical regions. We focus on Italy, that is among the wealthiest nations in the world, and also an example of a complex economic system. In particular, we had access to a large subset of anonymised and GDPR-compliant wire transfer data recorded from Jan 2016 to Dec 2017 by Intesa Sanpaolo, a leading banking group in the Eurozone, and the most important one in Italy.Intesa Sanpaolo wire transfers network exhibits a strong heavy-tailed behaviour and a giant component that grows continuously around the same core of the 1% highest degree nodes, and it also shows a general disassortative pattern, even if some ranges of degrees’ values stand out from the trend. Structural heterogeneity is explored further by means of a bow-tie analysis, that shows clearly that the majority of relevant, in terms of transferred amount, transactions is settled between a smaller set of nodes that are associated to legal entities and that mostly belong to the strongly connected component. This observation brings to a more comprehensive inspection of differences between Italian regions and business sectors, that could support the detection and the understanding of the interplay between supply chains.Our results suggest that there is a general flow of money that seems to stream down from higher degree legal entities to lower degree natural persons, crossing Italian regions and connecting different business sectors, and that is finally redistributed through expenses sharing within families and smaller communities. We also describe a reference dataset and an empirical contribution to the study on financial networks, focusing on finer-grained information concerned about spending behaviour through wire transfers

    Differences in the spatial landscape of urban mobility: gender and socioeconomic perspectives

    Get PDF
    In society, many of our routines and activities are linked to our ability to move; be it commuting to work, shopping for groceries, or meeting friends. Yet, factors that limit the individuals' ability to fully realise their mobility needs will ultimately affect the opportunities they can have access to (e.g. cultural activities, professional interactions). One important aspect frequently overlooked in human mobility studies is how gender-centred issues can amplify other sources of mobility disadvantages (e.g. socioeconomic inequalities), unevenly affecting the pool of opportunities men and women have access to. In this work, we leverage on a combination of computational, statistical and information-theoretical approaches to investigate the existence of systematic discrepancies in the mobility diversity (i.e.the diversity of travel destinations) of (1) men and women from different socioeconomic backgrounds, and (2) work and non-work travels. Our analysis is based on datasets containing multiple instances of large-scale, official, travel surveys carried out in three major metropolitan areas in South America: Medell\'in and Bogot\'a in Colombia, and S\~ao Paulo in Brazil. Our results indicate the presence of general discrepancies in the urban mobility diversities related to the gender and socioeconomic characteristics of the individuals. Lastly, this paper sheds new light on the possible origins of gender-level human mobility inequalities, contributing to the general understanding of disaggregated patterns in human mobility

    Data Management Law for the 2020s: The Lost Origins and the New Needs

    Get PDF
    In the data analytics society, each individual’s disclosure of personal information imposes costs on others. This disclosure enables companies, deploying novel forms of data analytics, to infer new knowledge about other people and to use this knowledge to engage in potentially harmful activities. These harms go beyond privacy and include difficult to detect price discrimination, preference manipulation, and even social exclusion. Currently existing, individual-focused, data protection regimes leave law unable to account for these social costs or to manage them. This Article suggests a way out, by proposing to re-conceptualize the problem of social costs of data analytics through the new frame of “data management law.” It offers a critical comparison of the two existing models of data governance: the American “notice and choice” approach and the European “personal data protection” regime (currently expressed in the General Data Protection Regulation). Tracing their origin to a single report issued in 1973, the Article demonstrates how they developed differently under the influence of different ideologies (market-centered liberalism, and human rights, respectively). It also shows how both ultimately failed at addressing the challenges outlined already forty-five years ago. To tackle these challenges, this Article argues for three normative shifts. First, it proposes to go beyond “privacy” and towards “social costs of data management” as the framework for conceptualizing and mitigating negative effects of corporations’ data usage. Second, it argues to go beyond the individual interests, to account for collective ones, and to replace contracts with regulation as the means of creating norms governing data management. Third, it argues that the nature of the decisions about these norms is political, and so political means, in place of technocratic solutions, need to be employed


    Get PDF
    Understanding dynamic interactions between human activities and land-use structure in a city is a key lens to explore the city as a complex system. This dissertation contributes to understanding the complexity of urban dynamics by gaining knowledge of the interactions between human activities and city land-use structures by utilizing free-accessible socially sensed data sources, and building upon recent research trend and technologies in geographical information science, urban study, and computer science. This dissertation addresses three main questions related to human dynamics: 1) how human activities in an urban environment are shaped by socioeconomic status and the intra-city land-use structure, and how in turn, the knowledge of socioeconomic status-activity relationships can contribute to understanding the social landscape of a city; 2) how different types of activities are located in space and time in three U.S. cities and how the spatiotemporal activity patterns in these cities characterize the activity profile of different neighborhoods in the cities; and 3) how recent socially sensed information on human activities can be integrated with widely-used remotely sensed geographical data to create a novel approach for discovering patterns of land use in cities that are otherwise lacking in up to date land use information. This dissertation models the associations between socioeconomics and mobility in the Washington, D.C. metropolitan area as a case study and applies the learned associations for inferring geographical patterns of socioeconomic status (SES) solely using the socially sensed data. This dissertation also implements a semi-automated workflow to retrieve activity details from socially sensed Twitter data in Washington, D.C., the City of Baltimore, and New York City. The dissertation integrates remotely-sensed imagery and socially sensed data to model the dynamics associated with changing land-use types in the Washington, D.C.-Baltimore metropolitan area over time