15 research outputs found
Comparison of Feature Extraction Methods and Predictors for Income Inference
Abstract—Patterns of mobile phone communications, coupled with the information of the social network graph and financial behavior, allow us to make inferences of users’ socio-economic attributes such as their income level. We present here several methods to extract features from mobile phone usage (calls and messages), and compare different combinations of supervised machine learning techniques and sets of features used as input for the inference of users’ income. Our experimental results show that the Bayesian method based on the communication graph outperforms standard machine learning algorithms using nodebased features.Sociedad Argentina de Informática e Investigación Operativa (SADIO
Comparison of Feature Extraction Methods and Predictors for Income Inference
Patterns of mobile phone communications, coupled with the information of the
social network graph and financial behavior, allow us to make inferences of
users' socio-economic attributes such as their income level. We present here
several methods to extract features from mobile phone usage (calls and
messages), and compare different combinations of supervised machine learning
techniques and sets of features used as input for the inference of users'
income. Our experimental results show that the Bayesian method based on the
communication graph outperforms standard machine learning algorithms using
node-based features.Comment: Argentine Symposium on Big Data (AGRANDA), September 5, 201
Comparison of Feature Extraction Methods and Predictors for Income Inference
Abstract—Patterns of mobile phone communications, coupled with the information of the social network graph and financial behavior, allow us to make inferences of users’ socio-economic attributes such as their income level. We present here several methods to extract features from mobile phone usage (calls and messages), and compare different combinations of supervised machine learning techniques and sets of features used as input for the inference of users’ income. Our experimental results show that the Bayesian method based on the communication graph outperforms standard machine learning algorithms using nodebased features.Sociedad Argentina de Informática e Investigación Operativa (SADIO
Interpreting wealth distribution via poverty map inference using multimodal data
Poverty maps are essential tools for governments and NGOs to track
socioeconomic changes and adequately allocate infrastructure and services in
places in need. Sensor and online crowd-sourced data combined with machine
learning methods have provided a recent breakthrough in poverty map inference.
However, these methods do not capture local wealth fluctuations, and are not
optimized to produce accountable results that guarantee accurate predictions to
all sub-populations. Here, we propose a pipeline of machine learning models to
infer the mean and standard deviation of wealth across multiple geographically
clustered populated places, and illustrate their performance in Sierra Leone
and Uganda. These models leverage seven independent and freely available
feature sources based on satellite images, and metadata collected via online
crowd-sourcing and social media. Our models show that combined metadata
features are the best predictors of wealth in rural areas, outperforming
image-based models, which are the best for predicting the highest wealth
quintiles. Our results recover the local mean and variation of wealth, and
correctly capture the positive yet non-monotonous correlation between them. We
further demonstrate the capabilities and limitations of model transfer across
countries and the effects of data recency and other biases. Our methodology
provides open tools to build towards more transparent and interpretable models
to help governments and NGOs to make informed decisions based on data
availability, urbanization level, and poverty thresholds.Comment: 12 pages. In Proceedings of the ACM Web Conference 2023 (WWW'23
Structural inequalities emerging from a large wire transfers network
We aim to explore the connections between structural network inequalities and bank’s customer spending behaviours, within an entire national ecosystem made of natural persons (i.e., an individual human being) and legal entities (i.e., private or public organisations), different business sectors, and supply chains that span distinct geographical regions. We focus on Italy, that is among the wealthiest nations in the world, and also an example of a complex economic system. In particular, we had access to a large subset of anonymised and GDPR-compliant wire transfer data recorded from Jan 2016 to Dec 2017 by Intesa Sanpaolo, a leading banking group in the Eurozone, and the most important one in Italy.Intesa Sanpaolo wire transfers network exhibits a strong heavy-tailed behaviour and a giant component that grows continuously around the same core of the 1% highest degree nodes, and it also shows a general disassortative pattern, even if some ranges of degrees’ values stand out from the trend. Structural heterogeneity is explored further by means of a bow-tie analysis, that shows clearly that the majority of relevant, in terms of transferred amount, transactions is settled between a smaller set of nodes that are associated to legal entities and that mostly belong to the strongly connected component. This observation brings to a more comprehensive inspection of differences between Italian regions and business sectors, that could support the detection and the understanding of the interplay between supply chains.Our results suggest that there is a general flow of money that seems to stream down from higher degree legal entities to lower degree natural persons, crossing Italian regions and connecting different business sectors, and that is finally redistributed through expenses sharing within families and smaller communities. We also describe a reference dataset and an empirical contribution to the study on financial networks, focusing on finer-grained information concerned about spending behaviour through wire transfers
Differences in the spatial landscape of urban mobility: gender and socioeconomic perspectives
In society, many of our routines and activities are linked to our ability to
move; be it commuting to work, shopping for groceries, or meeting friends. Yet,
factors that limit the individuals' ability to fully realise their mobility
needs will ultimately affect the opportunities they can have access to (e.g.
cultural activities, professional interactions). One important aspect
frequently overlooked in human mobility studies is how gender-centred issues
can amplify other sources of mobility disadvantages (e.g. socioeconomic
inequalities), unevenly affecting the pool of opportunities men and women have
access to. In this work, we leverage on a combination of computational,
statistical and information-theoretical approaches to investigate the existence
of systematic discrepancies in the mobility diversity (i.e.the diversity of
travel destinations) of (1) men and women from different socioeconomic
backgrounds, and (2) work and non-work travels. Our analysis is based on
datasets containing multiple instances of large-scale, official, travel surveys
carried out in three major metropolitan areas in South America: Medell\'in and
Bogot\'a in Colombia, and S\~ao Paulo in Brazil. Our results indicate the
presence of general discrepancies in the urban mobility diversities related to
the gender and socioeconomic characteristics of the individuals. Lastly, this
paper sheds new light on the possible origins of gender-level human mobility
inequalities, contributing to the general understanding of disaggregated
patterns in human mobility
Data Management Law for the 2020s: The Lost Origins and the New Needs
In the data analytics society, each individual’s disclosure of personal information imposes costs on others. This disclosure enables companies, deploying novel forms of data analytics, to infer new knowledge about other people and to use this knowledge to engage in potentially harmful activities. These harms go beyond privacy and include difficult to detect price discrimination, preference manipulation, and even social exclusion. Currently existing, individual-focused, data protection regimes leave law unable to account for these social costs or to manage them.
This Article suggests a way out, by proposing to re-conceptualize the problem of social costs of data analytics through the new frame of “data management law.” It offers a critical comparison of the two existing models of data governance: the American “notice and choice” approach and the European “personal data protection” regime (currently expressed in the General Data Protection Regulation). Tracing their origin to a single report issued in 1973, the Article demonstrates how they developed differently under the influence of different ideologies (market-centered liberalism, and human rights, respectively). It also shows how both ultimately failed at addressing the challenges outlined already forty-five years ago.
To tackle these challenges, this Article argues for three normative shifts. First, it proposes to go beyond “privacy” and towards “social costs of data management” as the framework for conceptualizing and mitigating negative effects of corporations’ data usage. Second, it argues to go beyond the individual interests, to account for collective ones, and to replace contracts with regulation as the means of creating norms governing data management. Third, it argues that the nature of the decisions about these norms is political, and so political means, in place of technocratic solutions, need to be employed
USING SOCIALLY SENSED BIG DATA TO MODEL PATTERNS AND GEOGRAPHIC CONTEXT OF HUMAN ACTIVITIES IN CITIES
Understanding dynamic interactions between human activities and land-use structure in a city is a key lens to explore the city as a complex system. This dissertation contributes to understanding the complexity of urban dynamics by gaining knowledge of the interactions between human activities and city land-use structures by utilizing free-accessible socially sensed data sources, and building upon recent research trend and technologies in geographical information science, urban study, and computer science. This dissertation addresses three main questions related to human dynamics: 1) how human activities in an urban environment are shaped by socioeconomic status and the intra-city land-use structure, and how in turn, the knowledge of socioeconomic status-activity relationships can contribute to understanding the social landscape of a city; 2) how different types of activities are located in space and time in three U.S. cities and how the spatiotemporal activity patterns in these cities characterize the activity profile of different neighborhoods in the cities; and 3) how recent socially sensed information on human activities can be integrated with widely-used remotely sensed geographical data to create a novel approach for discovering patterns of land use in cities that are otherwise lacking in up to date land use information. This dissertation models the associations between socioeconomics and mobility in the Washington, D.C. metropolitan area as a case study and applies the learned associations for inferring geographical patterns of socioeconomic status (SES) solely using the socially sensed data. This dissertation also implements a semi-automated workflow to retrieve activity details from socially sensed Twitter data in Washington, D.C., the City of Baltimore, and New York City. The dissertation integrates remotely-sensed imagery and socially sensed data to model the dynamics associated with changing land-use types in the Washington, D.C.-Baltimore metropolitan area over time