12 research outputs found
Listening between the Lines: Learning Personal Attributes from Conversations
Open-domain dialogue agents must be able to converse about many topics while
incorporating knowledge about the user into the conversation. In this work we
address the acquisition of such knowledge, for personalization in downstream
Web applications, by extracting personal attributes from conversations. This
problem is more challenging than the established task of information extraction
from scientific publications or Wikipedia articles, because dialogues often
give merely implicit cues about the speaker. We propose methods for inferring
personal attributes, such as profession, age or family status, from
conversations using deep learning. Specifically, we propose several Hidden
Attribute Models, which are neural networks leveraging attention mechanisms and
embeddings. Our methods are trained on a per-predicate basis to output rankings
of object values for a given subject-predicate combination (e.g., ranking the
doctor and nurse professions high when speakers talk about patients, emergency
rooms, etc). Experiments with various conversational texts including Reddit
discussions, movie scripts and a collection of crowdsourced personal dialogues
demonstrate the viability of our methods and their superior performance
compared to state-of-the-art baselines.Comment: published in WWW'1
Multilingual Twitter Corpus and Baselines for Evaluating Demographic Bias in Hate Speech Recognition
Existing research on fairness evaluation of document classification models
mainly uses synthetic monolingual data without ground truth for author
demographic attributes. In this work, we assemble and publish a multilingual
Twitter corpus for the task of hate speech detection with inferred four author
demographic factors: age, country, gender and race/ethnicity. The corpus covers
five languages: English, Italian, Polish, Portuguese and Spanish. We evaluate
the inferred demographic labels with a crowdsourcing platform, Figure Eight. To
examine factors that can cause biases, we take an empirical analysis of
demographic predictability on the English corpus. We measure the performance of
four popular document classifiers and evaluate the fairness and bias of the
baseline classifiers on the author-level demographic attributes.Comment: Accepted at LREC 202
Correcting Sociodemographic Selection Biases for Population Prediction from Social Media
Social media is increasingly used for large-scale population predictions,
such as estimating community health statistics. However, social media users are
not typically a representative sample of the intended population -- a
"selection bias". Within the social sciences, such a bias is typically
addressed with restratification techniques, where observations are reweighted
according to how under- or over-sampled their socio-demographic groups are.
Yet, restratifaction is rarely evaluated for improving prediction. Across four
tasks of predicting U.S. county population health statistics from Twitter, we
find standard restratification techniques provide no improvement and often
degrade prediction accuracies. The core reasons for this seems to be both
shrunken estimates (reduced variance of model predicted values) and sparse
estimates of each population's socio-demographics. We thus develop and evaluate
three methods to address these problems: estimator redistribution to account
for shrinking, and adaptive binning and informed smoothing to handle sparse
socio-demographic estimates. We show that each of these methods significantly
outperforms the standard restratification approaches. Combining approaches, we
find substantial improvements over non-restratified models, yielding a 53.0%
increase in predictive accuracy (R^2) in the case of surveyed life
satisfaction, and a 17.8% average increase across all tasks
Listening between the Lines: Learning Personal Attributes from Conversations
Open-domain dialogue agents must be able to converse about many topics while incorporating knowledge about the user into the conversation. In this work we address the acquisition of such knowledge, for personalization in downstream Web applications, by extracting personal attributes from conversations. This problem is more challenging than the established task of information extraction from scientific publications or Wikipedia articles, because dialogues often give merely implicit cues about the speaker. We propose methods for inferring personal attributes, such as profession, age or family status, from conversations using deep learning. Specifically, we propose several Hidden Attribute Models, which are neural networks leveraging attention mechanisms and embeddings. Our methods are trained on a per-predicate basis to output rankings of object values for a given subject-predicate combination (e.g., ranking the doctor and nurse professions high when speakers talk about patients, emergency rooms, etc). Experiments with various conversational texts including Reddit discussions, movie scripts and a collection of crowdsourced personal dialogues demonstrate the viability of our methods and their superior performance compared to state-of-the-art baselines
Profile Update: The Effects of Identity Disclosure on Network Connections and Language
Our social identities determine how we interact and engage with the world
surrounding us. In online settings, individuals can make these identities
explicit by including them in their public biography, possibly signaling a
change to what is important to them and how they should be viewed. Here, we
perform the first large-scale study on Twitter that examines behavioral changes
following identity signal addition on Twitter profiles. Combining social
networks with NLP and quasi-experimental analyses, we discover that after
disclosing an identity on their profiles, users (1) generate more tweets
containing language that aligns with their identity and (2) connect more to
same-identity users. We also examine whether adding an identity signal
increases the number of offensive replies and find that (3) the combined effect
of disclosing identity via both tweets and profiles is associated with a
reduced number of offensive replies from others