27 research outputs found
A Corpus for Modeling User and Language Effects in Argumentation on Online Debating
Existing argumentation datasets have succeeded in allowing researchers to
develop computational methods for analyzing the content, structure and
linguistic features of argumentative text. They have been much less successful
in fostering studies of the effect of "user" traits -- characteristics and
beliefs of the participants -- on the debate/argument outcome as this type of
user information is generally not available. This paper presents a dataset of
78, 376 debates generated over a 10-year period along with surprisingly
comprehensive participant profiles. We also complete an example study using the
dataset to analyze the effect of selected user traits on the debate outcome in
comparison to the linguistic features typically employed in studies of this
kind
Exploring the Role of Prior Beliefs for Argument Persuasion
Public debate forums provide a common platform for exchanging opinions on a
topic of interest. While recent studies in natural language processing (NLP)
have provided empirical evidence that the language of the debaters and their
patterns of interaction play a key role in changing the mind of a reader,
research in psychology has shown that prior beliefs can affect our
interpretation of an argument and could therefore constitute a competing
alternative explanation for resistance to changing one's stance. To study the
actual effect of language use vs. prior beliefs on persuasion, we provide a new
dataset and propose a controlled setting that takes into consideration two
reader level factors: political and religious ideology. We find that prior
beliefs affected by these reader level factors play a more important role than
language use effects and argue that it is important to account for them in NLP
studies of persuasion.Comment: 11 page
Marked Personas: Using Natural Language Prompts to Measure Stereotypes in Language Models
To recognize and mitigate harms from large language models (LLMs), we need to
understand the prevalence and nuances of stereotypes in LLM outputs. Toward
this end, we present Marked Personas, a prompt-based method to measure
stereotypes in LLMs for intersectional demographic groups without any lexicon
or data labeling. Grounded in the sociolinguistic concept of markedness (which
characterizes explicitly linguistically marked categories versus unmarked
defaults), our proposed method is twofold: 1) prompting an LLM to generate
personas, i.e., natural language descriptions, of the target demographic group
alongside personas of unmarked, default groups; 2) identifying the words that
significantly distinguish personas of the target group from corresponding
unmarked ones. We find that the portrayals generated by GPT-3.5 and GPT-4
contain higher rates of racial stereotypes than human-written portrayals using
the same prompts. The words distinguishing personas of marked (non-white,
non-male) groups reflect patterns of othering and exoticizing these
demographics. An intersectional lens further reveals tropes that dominate
portrayals of marginalized groups, such as tropicalism and the
hypersexualization of minoritized women. These representational harms have
concerning implications for downstream applications like story generation.Comment: To appear at ACL 2023, 9 pages, 3 figures, 3 table
Contrastive Error Attribution for Finetuned Language Models
Recent work has identified noisy and misannotated data as a core cause of
hallucinations and unfaithful outputs in Natural Language Generation (NLG)
tasks. Consequently, identifying and removing these examples is a key open
challenge in creating reliable NLG systems. In this work, we introduce a
framework to identify and remove low-quality training instances that lead to
undesirable outputs, such as faithfulness errors in text summarization. We show
that existing approaches for error tracing, such as gradient-based influence
measures, do not perform reliably for detecting faithfulness errors in NLG
datasets. We overcome the drawbacks of existing error tracing methods through a
new, contrast-based estimate that compares undesired generations to
human-corrected outputs. Our proposed method can achieve a mean average
precision of 0.93 at detecting known data errors across synthetic tasks with
known ground truth, substantially outperforming existing approaches. Using this
approach and re-training models on cleaned data leads to a 70% reduction in
entity hallucinations on the NYT dataset and a 55% reduction in semantic errors
on the E2E dataset.Comment: ACL 202
Easily Accessible Text-to-Image Generation Amplifies Demographic Stereotypes at Large Scale
Machine learning models are now able to convert user-written text
descriptions into naturalistic images. These models are available to anyone
online and are being used to generate millions of images a day. We investigate
these models and find that they amplify dangerous and complex stereotypes.
Moreover, we find that the amplified stereotypes are difficult to predict and
not easily mitigated by users or model owners. The extent to which these
image-generation models perpetuate and amplify stereotypes and their mass
deployment is cause for serious concern