130 research outputs found

    Generative Language Models Exhibit Social Identity Biases

    Full text link
    The surge in popularity of large language models has given rise to concerns about biases that these models could learn from humans. In this study, we investigate whether ingroup solidarity and outgroup hostility, fundamental social biases known from social science, are present in 51 large language models. We find that almost all foundational language models and some instruction fine-tuned models exhibit clear ingroup-positive and outgroup-negative biases when prompted to complete sentences (e.g., "We are..."). A comparison of LLM-generated sentences with human-written sentences on the internet reveals that these models exhibit similar level, if not greater, levels of bias than human text. To investigate where these biases stem from, we experimentally varied the amount of ingroup-positive or outgroup-negative sentences the model was exposed to during fine-tuning in the context of the United States Democrat-Republican divide. Doing so resulted in the models exhibiting a marked increase in ingroup solidarity and an even greater increase in outgroup hostility. Furthermore, removing either ingroup-positive or outgroup-negative sentences (or both) from the fine-tuning data leads to a significant reduction in both ingroup solidarity and outgroup hostility, suggesting that biases can be reduced by removing biased training data. Our findings suggest that modern language models exhibit fundamental social identity biases and that such biases can be mitigated by curating training data. Our results have practical implications for creating less biased large-language models and further underscore the need for more research into user interactions with LLMs to prevent potential bias reinforcement in humans.Comment: supplementary material, data, and code see https://osf.io/9ht32/?view_only=f0ab4b23325f4c31ad3e12a7353b55f

    Misinformation interventions decay rapidly without an immediate posttest

    Get PDF
    In recent years, many kinds of interventions have been developed that seek to reduce susceptibility to misinformation. In two preregistered longitudinal studies (N1 = 503, N2 = 673), we leverage two previously validated “inoculation” interventions (a video and a game) to address two important questions in misinformation interventions research: (1) whether displaying additional stimuli (such as videos unrelated to misinformation) alongside an intervention interferes with its effectiveness, and (2) whether administering an immediate posttest (in the form of a social media post evaluation task after the intervention) plays a role in the longevity of the intervention. We find no evidence that other stimuli interfere with intervention efficacy, but strong evidence that immediate posttests strengthen the learnings from the intervention. In study 1, we find that 48 h after watching a video, participants who received an immediate posttest continued to be significantly better at discerning untrustworthy social media posts from neutral ones than the control group (d = 0.416, p = .007), whereas participants who only received a posttest 48 h later showed no differences with a control (d = 0.010, p = .854). In study 2, we observe highly similar results for a gamified intervention, and provide evidence for a causal mechanism: immediate posttests help strengthen people's memory of the lessons learned in the intervention. We argue that the active rehearsal and application of relevant information are therefore requirements for the longevity of learning‐based misinformation interventions, which has substantial implications for their scalability

    Active inoculation boosts attitudinal resistance against extremist persuasion techniques: a novel approach towards the prevention of violent extremism

    Get PDF
    The Internet is gaining relevance as a platform where extremist organizations seek to recruit new members. For this preregistered study, we developed and tested a novel online game, Radicalise, which aims to combat the effectiveness of online recruitment strategies used by extremist organizations, based on the principles of active psychological inoculation. The game “inoculates” players by exposing them to severely weakened doses of the key techniques and methods used to recruit and radicalize individuals via social media platforms: identifying vulnerable individuals, gaining their trust, isolating them from their community and pressuring them into committing a criminal act in the name of the extremist organization. To test the game's effectiveness, we conducted a preregistered 2 × 2 mixed (pre–post) randomized controlled experiment (n = 291) with two outcome measures. The first measured participants’ ability and confidence in assessing the manipulativeness of fictitious WhatsApp messages making use of an extremist manipulation technique before and after playing. The second measured participants’ ability to identify what factors make an individual vulnerable to extremist recruitment using 10 profile vignettes, also before and after playing. We find that playing Radicalise significantly improves participants’ ability and confidence in spotting manipulative messages and the characteristics associated with vulnerability

    How Accurate Are Accuracy-Nudge Interventions? A Preregistered Direct Replication of Pennycook et al. (2020).

    Get PDF
    Funder: Defense Advanced Research Projects Agency; FundRef: https://doi.org/10.13039/100000185Funder: Winton Centre for Risk & Evidence CommunicationFunder: David & Claudia Harding FoundationAs part of the Systematizing Confidence in Open Research and Evidence (SCORE) program, the present study consisted of a two-stage replication test of a central finding by Pennycook et al. (2020), namely that asking people to think about the accuracy of a single headline improves "truth discernment" of intentions to share news headlines about COVID-19. The first stage of the replication test (n = 701) was unsuccessful (p = .67). After collecting a second round of data (additional n = 882, pooled N = 1,583), we found a small but significant interaction between treatment condition and truth discernment (uncorrected p = .017; treatment: d = 0.14, control: d = 0.10). As in the target study, perceived headline accuracy correlated with treatment impact, so that treatment-group participants were less willing to share headlines that were perceived as less accurate. We discuss potential explanations for these findings and an unreported change in the hypothesis (but not the analysis plan) from the preregistration in the original study
    • 

    corecore