46 research outputs found

    MoralStrength: Exploiting a Moral Lexicon and Embedding Similarity for Moral Foundations Prediction

    Get PDF
    Moral rhetoric plays a fundamental role in how we perceive and interpret the information we receive, greatly influencing our decision-making process. Especially when it comes to controversial social and political issues, our opinions and attitudes are hardly ever based on evidence alone. The Moral Foundations Dictionary (MFD) was developed to operationalize moral values in the text. In this study, we present MoralStrength, a lexicon of approximately 1,000 lemmas, obtained as an extension of the Moral Foundations Dictionary, based on WordNet synsets. Moreover, for each lemma it provides with a crowdsourced numeric assessment of Moral Valence, indicating the strength with which a lemma is expressing the specific value. We evaluated the predictive potentials of this moral lexicon, defining three utilization approaches of increased complexity, ranging from lemmas' statistical properties to a deep learning approach of word embeddings based on semantic similarity. Logistic regression models trained on the features extracted from MoralStrength, significantly outperformed the current state-of-the-art, reaching an F1-score of 87.6% over the previous 62.4% (p-value<0.01), and an average F1-Score of 86.25% over six different datasets. Such findings pave the way for further research, allowing for an in-depth understanding of moral narratives in text for a wide range of social issues

    Monitoring Gender Gaps via LinkedIn Advertising Estimates: the case study of Italy

    Full text link
    Women remain underrepresented in the labour market. Although significant advancements are being made to increase female participation in the workforce, the gender gap is still far from being bridged. We contribute to the growing literature on gender inequalities in the labour market, evaluating the potential of the LinkedIn estimates to monitor the evolution of the gender gaps sustainably, complementing the official data sources. In particular, assessing the labour market patterns at a subnational level in Italy. Our findings show that the LinkedIn estimates accurately capture the gender disparities in Italy regarding sociodemographic attributes such as gender, age, geographic location, seniority, and industry category. At the same time, we assess data biases such as the digitalisation gap, which impacts the representativity of the workforce in an imbalanced manner, confirming that women are under-represented in Southern Italy. Additionally to confirming the gender disparities to the official census, LinkedIn estimates are a valuable tool to provide dynamic insights; we showed an immigration flow of highly skilled women, predominantly from the South. Digital surveillance of gender inequalities with detailed and timely data is particularly significant to enable policymakers to tailor impactful campaigns.Comment: 10 page

    Leave no Place Behind: Improved Geolocation in Humanitarian Documents

    Full text link
    Geographical location is a crucial element of humanitarian response, outlining vulnerable populations, ongoing events, and available resources. Latest developments in Natural Language Processing may help in extracting vital information from the deluge of reports and documents produced by the humanitarian sector. However, the performance and biases of existing state-of-the-art information extraction tools are unknown. In this work, we develop annotated resources to fine-tune the popular Named Entity Recognition (NER) tools Spacy and roBERTa to perform geotagging of humanitarian texts. We then propose a geocoding method FeatureRank which links the candidate locations to the GeoNames database. We find that not only does the humanitarian-domain data improves the performance of the classifiers (up to F1 = 0.92), but it also alleviates some of the bias of the existing tools, which erroneously favor locations in the Western countries. Thus, we conclude that more resources from non-Western documents are necessary to ensure that off-the-shelf NER systems are suitable for the deployment in the humanitarian sector

    Traditional versus facebook-based surveys: Evaluation of biases in self-reported demographic and psychometric information

    Get PDF
    Background: Social media in scientific research offers a unique digital observatory of human behaviours and hence great opportunities to conduct research at large scale, answering complex sociodemographic questions. We focus on the identification and assessment of biases in social-media-administered surveys.Objective: This study aims to shed light on population, self-selection, and behavioural biases, empirically comparing the consistency between self-reported information collected traditionally versus social-media-administered questionnaires, including demographic and psychometric attributes.Methods: We engaged a demographically representative cohort of young adults in Italy (approximately 4,000 participants) in taking a traditionally administered online survey and then, after one year, we invited them to use our ad hoc Facebook application (988 accepted) where they filled in part of the initial survey. We assess the statistically significant differences indicating population, self-selection, and behavioural biases due to the different context in which the questionnaire is administered.Results: Our findings suggest that surveys administered on Facebook do not exhibit major biases with respect to traditionally administered surveys in terms of neither demographics nor personality traits. Loyalty, authority, and social binding values were higher in the Facebook platform, probably due to the platform?s intrinsic social character.Conclusions: We conclude that Facebook apps are valid research tools for administering demographic and psychometric surveys, provided that the entailed biases are taken into consideration.Contribution: We contribute to the characterisation of Facebook apps as a valid scientific tool to administer demographic and psychometric surveys, and to the assessment of population, self-selection, and behavioural biases in the collected data.Fil: Kalimeri, Kyriaki. Institute for Scientific Interchange Foundation; ItaliaFil: Beiro, Mariano Gastón. Consejo Nacional de Investigaciones Científicas y Técnicas. Oficina de Coordinación Administrativa Houssay. Instituto de Tecnologías y Ciencias de la Ingeniería "Hilario Fernández Long". Universidad de Buenos Aires. Facultad de Ingeniería. Instituto de Tecnologías y Ciencias de la Ingeniería "Hilario Fernández Long"; ArgentinaFil: Bonanomi, Andrea. Università Cattolica del Sacro Cuore; ItaliaFil: Rosina, Alessandro. Università Cattolica del Sacro Cuore; ItaliaFil: Cattuto, Ciro. Isi Foundation; Itali

    Facebook Ads as a Demographic Tool to Measure the Urban-Rural Divide

    Full text link
    In the global move toward urbanization, making sure the people remaining in rural areas are not left behind in terms of development and policy considerations is a priority for governments worldwide. However, it is increasingly challenging to track important statistics concerning this sparse, geographically dispersed population, resulting in a lack of reliable, up-to-date data. In this study, we examine the usefulness of the Facebook Advertising platform, which offers a digital "census" of over two billions of its users, in measuring potential rural-urban inequalities. We focus on Italy, a country where about 30% of the population lives in rural areas. First, we show that the population statistics that Facebook produces suffer from instability across time and incomplete coverage of sparsely populated municipalities. To overcome such limitation, we propose an alternative methodology for estimating Facebook Ads audiences that nearly triples the coverage of the rural municipalities from 19% to 55% and makes feasible fine-grained sub-population analysis. Using official national census data, we evaluate our approach and confirm known significant urban-rural divides in terms of educational attainment and income. Extending the analysis to Facebook-specific user "interests" and behaviors, we provide further insights on the divide, for instance, finding that rural areas show a higher interest in gambling. Notably, we find that the most predictive features of income in rural areas differ from those for urban centres, suggesting researchers need to consider a broader range of attributes when examining rural wellbeing. The findings of this study illustrate the necessity of improving existing tools and methodologies to include under-represented populations in digital demographic studies -- the failure to do so could result in misleading observations, conclusions, and most importantly, policies.Comment: To be published in the Proceedings of The Web Conference 2020 (WWW '20
    corecore