2 research outputs found

    Fine-tuning language models to find agreement among humans with diverse preferences

    Full text link
    Recent work in large language modeling (LLMs) has used fine-tuning to align outputs with the preferences of a prototypical user. This work assumes that human preferences are static and homogeneous across individuals, so that aligning to a a single "generic" user will confer more general alignment. Here, we embrace the heterogeneity of human preferences to consider a different challenge: how might a machine help people with diverse views find agreement? We fine-tune a 70 billion parameter LLM to generate statements that maximize the expected approval for a group of people with potentially diverse opinions. Human participants provide written opinions on thousands of questions touching on moral and political issues (e.g., "should we raise taxes on the rich?"), and rate the LLM's generated candidate consensus statements for agreement and quality. A reward model is then trained to predict individual preferences, enabling it to quantify and rank consensus statements in terms of their appeal to the overall group, defined according to different aggregation (social welfare) functions. The model produces consensus statements that are preferred by human users over those from prompted LLMs (>70%) and significantly outperforms a tight fine-tuned baseline that lacks the final ranking step. Further, our best model's consensus statements are preferred over the best human-generated opinions (>65%). We find that when we silently constructed consensus statements from only a subset of group members, those who were excluded were more likely to dissent, revealing the sensitivity of the consensus to individual contributions. These results highlight the potential to use LLMs to help groups of humans align their values with one another

    Scaffolding cooperation in human groups with deep reinforcement learning

    No full text
    Altruism and selfishness are highly transmissible. Either can easily cascade through human communities. Effective approaches to encouraging group cooperation—while also mitigating the risk of spreading defection—are still an open challenge. Here, we apply recent advances in deep reinforcement learning to structure networks of human participants playing a group cooperation game. We leverage deep reinforcement learning and simulation methods to train a "social planner" capable of making recommendations to create or break connections between group members. This social planner learns a strategy from scratch, through repeated trial and error. The strategy that it develops succeeds at encouraging prosociality in networks of human participants (N = 208 participants in 13 groups) playing the group cooperation game for real monetary stakes. Under the social planner, groups finished the game with an average cooperation rate of 77.7%, compared to 42.8% in static networks (N = 176 participants in 11 groups). In contrast to prior strategies that separate defectors from cooperators (tested here with N = 384 participants in 24 groups), the social planner learns to take a conciliatory approach to defectors, encouraging them to act prosocially by moving them to small, highly-cooperative neighborhoods. Three follow-up studies (N = 624 participants in 39 groups) validate that this encouraging approach can scaffold group cooperation absent the "black box" of deep learning. Deep learning can help explore and identify new approaches to support human cooperation and prosociality
    corecore