46 research outputs found
MoralStrength: Exploiting a Moral Lexicon and Embedding Similarity for Moral Foundations Prediction
Moral rhetoric plays a fundamental role in how we perceive and interpret the
information we receive, greatly influencing our decision-making process.
Especially when it comes to controversial social and political issues, our
opinions and attitudes are hardly ever based on evidence alone. The Moral
Foundations Dictionary (MFD) was developed to operationalize moral values in
the text. In this study, we present MoralStrength, a lexicon of approximately
1,000 lemmas, obtained as an extension of the Moral Foundations Dictionary,
based on WordNet synsets. Moreover, for each lemma it provides with a
crowdsourced numeric assessment of Moral Valence, indicating the strength with
which a lemma is expressing the specific value. We evaluated the predictive
potentials of this moral lexicon, defining three utilization approaches of
increased complexity, ranging from lemmas' statistical properties to a deep
learning approach of word embeddings based on semantic similarity. Logistic
regression models trained on the features extracted from MoralStrength,
significantly outperformed the current state-of-the-art, reaching an F1-score
of 87.6% over the previous 62.4% (p-value<0.01), and an average F1-Score of
86.25% over six different datasets. Such findings pave the way for further
research, allowing for an in-depth understanding of moral narratives in text
for a wide range of social issues
Monitoring Gender Gaps via LinkedIn Advertising Estimates: the case study of Italy
Women remain underrepresented in the labour market. Although significant
advancements are being made to increase female participation in the workforce,
the gender gap is still far from being bridged. We contribute to the growing
literature on gender inequalities in the labour market, evaluating the
potential of the LinkedIn estimates to monitor the evolution of the gender gaps
sustainably, complementing the official data sources. In particular, assessing
the labour market patterns at a subnational level in Italy. Our findings show
that the LinkedIn estimates accurately capture the gender disparities in Italy
regarding sociodemographic attributes such as gender, age, geographic location,
seniority, and industry category. At the same time, we assess data biases such
as the digitalisation gap, which impacts the representativity of the workforce
in an imbalanced manner, confirming that women are under-represented in
Southern Italy. Additionally to confirming the gender disparities to the
official census, LinkedIn estimates are a valuable tool to provide dynamic
insights; we showed an immigration flow of highly skilled women, predominantly
from the South. Digital surveillance of gender inequalities with detailed and
timely data is particularly significant to enable policymakers to tailor
impactful campaigns.Comment: 10 page
Leave no Place Behind: Improved Geolocation in Humanitarian Documents
Geographical location is a crucial element of humanitarian response,
outlining vulnerable populations, ongoing events, and available resources.
Latest developments in Natural Language Processing may help in extracting vital
information from the deluge of reports and documents produced by the
humanitarian sector. However, the performance and biases of existing
state-of-the-art information extraction tools are unknown. In this work, we
develop annotated resources to fine-tune the popular Named Entity Recognition
(NER) tools Spacy and roBERTa to perform geotagging of humanitarian texts. We
then propose a geocoding method FeatureRank which links the candidate locations
to the GeoNames database. We find that not only does the humanitarian-domain
data improves the performance of the classifiers (up to F1 = 0.92), but it also
alleviates some of the bias of the existing tools, which erroneously favor
locations in the Western countries. Thus, we conclude that more resources from
non-Western documents are necessary to ensure that off-the-shelf NER systems
are suitable for the deployment in the humanitarian sector
Traditional versus facebook-based surveys: Evaluation of biases in self-reported demographic and psychometric information
Background: Social media in scientific research offers a unique digital observatory of human behaviours and hence great opportunities to conduct research at large scale, answering complex sociodemographic questions. We focus on the identification and assessment of biases in social-media-administered surveys.Objective: This study aims to shed light on population, self-selection, and behavioural biases, empirically comparing the consistency between self-reported information collected traditionally versus social-media-administered questionnaires, including demographic and psychometric attributes.Methods: We engaged a demographically representative cohort of young adults in Italy (approximately 4,000 participants) in taking a traditionally administered online survey and then, after one year, we invited them to use our ad hoc Facebook application (988 accepted) where they filled in part of the initial survey. We assess the statistically significant differences indicating population, self-selection, and behavioural biases due to the different context in which the questionnaire is administered.Results: Our findings suggest that surveys administered on Facebook do not exhibit major biases with respect to traditionally administered surveys in terms of neither demographics nor personality traits. Loyalty, authority, and social binding values were higher in the Facebook platform, probably due to the platform?s intrinsic social character.Conclusions: We conclude that Facebook apps are valid research tools for administering demographic and psychometric surveys, provided that the entailed biases are taken into consideration.Contribution: We contribute to the characterisation of Facebook apps as a valid scientific tool to administer demographic and psychometric surveys, and to the assessment of population, self-selection, and behavioural biases in the collected data.Fil: Kalimeri, Kyriaki. Institute for Scientific Interchange Foundation; ItaliaFil: Beiro, Mariano Gastón. Consejo Nacional de Investigaciones Científicas y Técnicas. Oficina de Coordinación Administrativa Houssay. Instituto de Tecnologías y Ciencias de la Ingeniería "Hilario Fernández Long". Universidad de Buenos Aires. Facultad de Ingeniería. Instituto de Tecnologías y Ciencias de la Ingeniería "Hilario Fernández Long"; ArgentinaFil: Bonanomi, Andrea. Università Cattolica del Sacro Cuore; ItaliaFil: Rosina, Alessandro. Università Cattolica del Sacro Cuore; ItaliaFil: Cattuto, Ciro. Isi Foundation; Itali
Facebook Ads as a Demographic Tool to Measure the Urban-Rural Divide
In the global move toward urbanization, making sure the people remaining in
rural areas are not left behind in terms of development and policy
considerations is a priority for governments worldwide. However, it is
increasingly challenging to track important statistics concerning this sparse,
geographically dispersed population, resulting in a lack of reliable,
up-to-date data. In this study, we examine the usefulness of the Facebook
Advertising platform, which offers a digital "census" of over two billions of
its users, in measuring potential rural-urban inequalities. We focus on Italy,
a country where about 30% of the population lives in rural areas. First, we
show that the population statistics that Facebook produces suffer from
instability across time and incomplete coverage of sparsely populated
municipalities. To overcome such limitation, we propose an alternative
methodology for estimating Facebook Ads audiences that nearly triples the
coverage of the rural municipalities from 19% to 55% and makes feasible
fine-grained sub-population analysis. Using official national census data, we
evaluate our approach and confirm known significant urban-rural divides in
terms of educational attainment and income. Extending the analysis to
Facebook-specific user "interests" and behaviors, we provide further insights
on the divide, for instance, finding that rural areas show a higher interest in
gambling. Notably, we find that the most predictive features of income in rural
areas differ from those for urban centres, suggesting researchers need to
consider a broader range of attributes when examining rural wellbeing. The
findings of this study illustrate the necessity of improving existing tools and
methodologies to include under-represented populations in digital demographic
studies -- the failure to do so could result in misleading observations,
conclusions, and most importantly, policies.Comment: To be published in the Proceedings of The Web Conference 2020 (WWW
'20