Search CORE

3 research outputs found

Using Noisy Self-Reports to Predict Twitter User Demographics

Author: Dredze Mark
Liu Xiao
Wood-Doughty Zach
Xu Paiheng
Publication venue: 'Association for Computational Linguistics (ACL)'
Publication date: 11/07/2021
Field of study

Computational social science studies often contextualize content analysis within standard demographics. Since demographics are unavailable on many social media platforms (e.g. Twitter) numerous studies have inferred demographics automatically. Despite many studies presenting proof of concept inference of race and ethnicity, training of practical systems remains elusive since there are few annotated datasets. Existing datasets are small, inaccurate, or fail to cover the four most common racial and ethnic groups in the United States. We present a method to identify self-reports of race and ethnicity from Twitter profile descriptions. Despite errors inherent in automated supervision, we produce models with good performance when measured on gold standard self-report survey data. The result is a reproducible method for creating large-scale training resources for race and ethnicity.Comment: The first two authors had an equal contribution. Accepted to SocialNLP @ NAACL 202

arXiv.org e-Print Archive

On the State of Social Media Data for Mental Health Research

Author: Aguirre Carlos
Dredze Mark
Harrigian Keith
Publication venue
Publication date: 10/11/2020
Field of study

Data-driven methods for mental health treatment and surveillance have become a major focus in computational science research in the last decade. However, progress in the domain, in terms of both medical understanding and system performance, remains bounded by the availability of adequate data. Prior systematic reviews have not necessarily made it possible to measure the degree to which data-related challenges have affected research progress. In this paper, we offer an analysis specifically on the state of social media data that exists for conducting mental health research. We do so by introducing an open-source directory of mental health datasets, annotated using a standardized schema to facilitate meta-analysis.Comment: Originally submitted to ICWSM in January 2020. Updated November 2020. Supplementary material at https://github.com/kharrigian/mental-health-dataset

arXiv.org e-Print Archive

Gender and Racial Fairness in Depression Research using Social Media

Author: Aguirre Carlos
Dredze Mark
Harrigian Keith
Publication venue
Publication date: 18/03/2021
Field of study

Multiple studies have demonstrated that behavior on internet-based social media platforms can be indicative of an individual's mental health status. The widespread availability of such data has spurred interest in mental health research from a computational lens. While previous research has raised concerns about possible biases in models produced from this data, no study has quantified how these biases actually manifest themselves with respect to different demographic groups, such as gender and racial/ethnic groups. Here, we analyze the fairness of depression classifiers trained on Twitter data with respect to gender and racial demographic groups. We find that model performance systematically differs for underrepresented groups and that these discrepancies cannot be fully explained by trivial data representation issues. Our study concludes with recommendations on how to avoid these biases in future research.Comment: Accepted to EACL 202

arXiv.org e-Print Archive