5 research outputs found
Moral Foundations of Large Language Models
Moral foundations theory (MFT) is a psychological assessment tool that
decomposes human moral reasoning into five factors, including care/harm,
liberty/oppression, and sanctity/degradation (Graham et al., 2009). People vary
in the weight they place on these dimensions when making moral decisions, in
part due to their cultural upbringing and political ideology. As large language
models (LLMs) are trained on datasets collected from the internet, they may
reflect the biases that are present in such corpora. This paper uses MFT as a
lens to analyze whether popular LLMs have acquired a bias towards a particular
set of moral values. We analyze known LLMs and find they exhibit particular
moral foundations, and show how these relate to human moral foundations and
political affiliations. We also measure the consistency of these biases, or
whether they vary strongly depending on the context of how the model is
prompted. Finally, we show that we can adversarially select prompts that
encourage the moral to exhibit a particular set of moral foundations, and that
this can affect the model's behavior on downstream tasks. These findings help
illustrate the potential risks and unintended consequences of LLMs assuming a
particular moral stance
LMRL Gym: Benchmarks for Multi-Turn Reinforcement Learning with Language Models
Large language models (LLMs) provide excellent text-generation capabilities,
but standard prompting and generation methods generally do not lead to
intentional or goal-directed agents and might necessitate considerable prompt
tuning. This becomes particularly apparent in multi-turn conversations: even
the best current LLMs rarely ask clarifying questions, engage in explicit
information gathering, or take actions now that lead to better decisions after
multiple turns. Reinforcement learning has the potential to leverage the
powerful modeling capabilities of LLMs, as well as their internal
representation of textual interactions, to create capable goal-directed
language agents. This can enable intentional and temporally extended
interactions, such as with humans, through coordinated persuasion and carefully
crafted questions, or in goal-directed play through text games to bring about
desired final outcomes. However, enabling this requires the community to
develop stable and reliable reinforcement learning algorithms that can
effectively train LLMs. Developing such algorithms requires tasks that can
gauge progress on algorithm design, provide accessible and reproducible
evaluations for multi-turn interactions, and cover a range of task properties
and challenges in improving reinforcement learning algorithms. Our paper
introduces the LMRL-Gym benchmark for evaluating multi-turn RL for LLMs,
together with an open-source research framework containing a basic toolkit for
getting started on multi-turn RL with offline value-based and policy-based RL
methods. Our benchmark consists of 8 different language tasks, which require
multiple rounds of language interaction and cover a range of tasks in
open-ended dialogue and text games
Personality Traits in Large Language Models
The advent of large language models (LLMs) has revolutionized natural
language processing, enabling the generation of coherent and contextually
relevant human-like text. As LLMs increasingly power conversational agents used
by the general public world-wide, the synthetic personality embedded in these
models, by virtue of training on large amounts of human data, is becoming
increasingly important. Since personality is a key factor determining the
effectiveness of communication, we present a comprehensive method for
administering and validating personality tests on widely-used LLMs, as well as
for shaping personality in the generated text of such LLMs. Applying this
method, we found: 1) personality measurements in the outputs of some LLMs under
specific prompting configurations are reliable and valid; 2) evidence of
reliability and validity of synthetic LLM personality is stronger for larger
and instruction fine-tuned models; and 3) personality in LLM outputs can be
shaped along desired dimensions to mimic specific human personality profiles.
We discuss application and ethical implications of the measurement and shaping
method, in particular regarding responsible AI
A Policy Gradient Algorithm for Learning to Learn in Multiagent Reinforcement Learning
A fundamental challenge in multiagent reinforcement learning is to learn
beneficial behaviors in a shared environment with other simultaneously learning
agents. In particular, each agent perceives the environment as effectively
non-stationary due to the changing policies of other agents. Moreover, each
agent is itself constantly learning, leading to natural non-stationarity in the
distribution of experiences encountered. In this paper, we propose a novel
meta-multiagent policy gradient theorem that directly accounts for the
non-stationary policy dynamics inherent to multiagent learning settings. This
is achieved by modeling our gradient updates to consider both an agent's own
non-stationary policy dynamics and the non-stationary policy dynamics of other
agents in the environment. We show that our theoretically grounded approach
provides a general solution to the multiagent learning problem, which
inherently comprises all key aspects of previous state of the art approaches on
this topic. We test our method on a diverse suite of multiagent benchmarks and
demonstrate a more efficient ability to adapt to new agents as they learn than
baseline methods across the full spectrum of mixed incentive, competitive, and
cooperative domains.Comment: Accepted to ICML 2021. Code at https://github.com/dkkim93/meta-mapg
and Videos at https://sites.google.com/view/meta-mapg/hom
Factored State Abstraction for Option Learning
Hierarchical reinforcement learning has focused on discovering temporally extended actions (options) to provide efficient solutions for long-horizon decision-making problems with sparse rewards. One promising approach that learns these options end-toend in this setting is the option-critic (OC) framework. However, there are several practical limitations of this method, including the lack of diversity between the learned sub-policies and sample inefficiency. This thesis shows that the OC framework does not decompose problems into smaller and largely independent components, but instead increases the problem complexity with each option by considering the entire state space during learning. To address this issue, we introduce state abstracted option-critic (SOC), a new framework that considers both temporal and state abstraction to effectively reduce the problem complexity in sparse reward settings. Our contribution includes learning a factored state space to enable each option to map to a sub-section of the state space. We test our method against hierarchical, nonhierarchical, and state abstraction baselines to demonstrate better sample efficiency and higher overall performance in both image and large vector-state representations under sparse reward settings.M.Eng