3 research outputs found
Using Noisy Self-Reports to Predict Twitter User Demographics
Computational social science studies often contextualize content analysis
within standard demographics. Since demographics are unavailable on many social
media platforms (e.g. Twitter) numerous studies have inferred demographics
automatically. Despite many studies presenting proof of concept inference of
race and ethnicity, training of practical systems remains elusive since there
are few annotated datasets. Existing datasets are small, inaccurate, or fail to
cover the four most common racial and ethnic groups in the United States. We
present a method to identify self-reports of race and ethnicity from Twitter
profile descriptions. Despite errors inherent in automated supervision, we
produce models with good performance when measured on gold standard self-report
survey data. The result is a reproducible method for creating large-scale
training resources for race and ethnicity.Comment: The first two authors had an equal contribution. Accepted to
SocialNLP @ NAACL 202
On the State of Social Media Data for Mental Health Research
Data-driven methods for mental health treatment and surveillance have become
a major focus in computational science research in the last decade. However,
progress in the domain, in terms of both medical understanding and system
performance, remains bounded by the availability of adequate data. Prior
systematic reviews have not necessarily made it possible to measure the degree
to which data-related challenges have affected research progress. In this
paper, we offer an analysis specifically on the state of social media data that
exists for conducting mental health research. We do so by introducing an
open-source directory of mental health datasets, annotated using a standardized
schema to facilitate meta-analysis.Comment: Originally submitted to ICWSM in January 2020. Updated November 2020.
Supplementary material at
https://github.com/kharrigian/mental-health-dataset
Gender and Racial Fairness in Depression Research using Social Media
Multiple studies have demonstrated that behavior on internet-based social
media platforms can be indicative of an individual's mental health status. The
widespread availability of such data has spurred interest in mental health
research from a computational lens. While previous research has raised concerns
about possible biases in models produced from this data, no study has
quantified how these biases actually manifest themselves with respect to
different demographic groups, such as gender and racial/ethnic groups. Here, we
analyze the fairness of depression classifiers trained on Twitter data with
respect to gender and racial demographic groups. We find that model performance
systematically differs for underrepresented groups and that these discrepancies
cannot be fully explained by trivial data representation issues. Our study
concludes with recommendations on how to avoid these biases in future research.Comment: Accepted to EACL 202