49 research outputs found
Characterizing Manipulation from AI Systems
Manipulation is a common concern in many domains, such as social media,
advertising, and chatbots. As AI systems mediate more of our interactions with
the world, it is important to understand the degree to which AI systems might
manipulate humans without the intent of the system designers. Our work
clarifies challenges in defining and measuring manipulation in the context of
AI systems. Firstly, we build upon prior literature on manipulation from other
fields and characterize the space of possible notions of manipulation, which we
find to depend upon the concepts of incentives, intent, harm, and covertness.
We review proposals on how to operationalize each factor. Second, we propose a
definition of manipulation based on our characterization: a system is
manipulative if it acts as if it were pursuing an incentive to change a human
(or another agent) intentionally and covertly. Third, we discuss the
connections between manipulation and related concepts, such as deception and
coercion. Finally, we contextualize our operationalization of manipulation in
some applications. Our overall assessment is that while some progress has been
made in defining and measuring manipulation from AI systems, many gaps remain.
In the absence of a consensus definition and reliable tools for measurement, we
cannot rule out the possibility that AI systems learn to manipulate humans
without the intent of the system designers. We argue that such manipulation
poses a significant threat to human autonomy, suggesting that precautionary
actions to mitigate it are warranted.Comment: Presented at EAAMO 2023; The first two authors contributed equally;
author order was decided with a coin fli
Engagement, User Satisfaction, and the Amplification of Divisive Content on Social Media
In a pre-registered randomized experiment, we found that, relative to a
reverse-chronological baseline, Twitter's engagement-based ranking algorithm
may amplify emotionally charged, out-group hostile content and contribute to
affective polarization. Furthermore, we critically examine the claim that the
algorithm shows users what they want to see, discovering that users do *not*
prefer the political tweets selected by the algorithm. Finally, we explore the
implications of an alternative approach to ranking content based on users'
stated preferences and find a reduction in angry, partisan, and out-group
hostile content but also a potential reinforcement of echo chambers. The
evidence underscores the necessity for a more nuanced approach to content
ranking that balances engagement, users' stated preferences, and sociopolitical
outcomes
On the Utility of Learning about Humans for Human-AI Coordination
While we would like agents that can coordinate with humans, current
algorithms such as self-play and population-based training create agents that
can coordinate with themselves. Agents that assume their partner to be optimal
or similar to them can converge to coordination protocols that fail to
understand and be understood by humans. To demonstrate this, we introduce a
simple environment that requires challenging coordination, based on the popular
game Overcooked, and learn a simple model that mimics human play. We evaluate
the performance of agents trained via self-play and population-based training.
These agents perform very well when paired with themselves, but when paired
with our human model, they are significantly worse than agents designed to play
with the human model. An experiment with a planning algorithm yields the same
conclusion, though only when the human-aware planner is given the exact human
model that it is playing with. A user study with real humans shows this pattern
as well, though less strongly. Qualitatively, we find that the gains come from
having the agent adapt to the human's gameplay. Given this result, we suggest
several approaches for designing agents that learn about humans in order to
better coordinate with them. Code is available at
https://github.com/HumanCompatibleAI/overcooked_ai.Comment: Published at NeurIPS 2019
(http://papers.nips.cc/paper/8760-on-the-utility-of-learning-about-humans-for-human-ai-coordination
UniMASK: Unified Inference in Sequential Decision Problems
Randomly masking and predicting word tokens has been a successful approach in
pre-training language models for a variety of downstream tasks. In this work,
we observe that the same idea also applies naturally to sequential
decision-making, where many well-studied tasks like behavior cloning, offline
reinforcement learning, inverse dynamics, and waypoint conditioning correspond
to different sequence maskings over a sequence of states, actions, and returns.
We introduce the UniMASK framework, which provides a unified way to specify
models which can be trained on many different sequential decision-making tasks.
We show that a single UniMASK model is often capable of carrying out many tasks
with performance similar to or better than single-task models. Additionally,
after fine-tuning, our UniMASK models consistently outperform comparable
single-task models. Our code is publicly available at
https://github.com/micahcarroll/uniMASK.Comment: NeurIPS 2022 (Oral). A prior version was published at an ICML
Workshop, available at arXiv:2204.1332
Issues Related to the Frequency of Exploratory Analyses by Evidence Review Groups in the NICE Single Technology Appraisal Process
BACKGROUND: Evidence Review Groups (ERGs) critically appraise company submissions as part of the National Institute for Health and Care Excellence (NICE) Single Technology Appraisal (STA) process. As part of their critique of the evidence submitted by companies, the ERGs undertake exploratory analyses to explore uncertainties in the company's model. The aim of this study was to explore pre-defined factors that might influence or predict the extent of ERG exploratory analyses. OBJECTIVE: The aim of this study was to explore predefined factors that might influence or predict the extent of ERG exploratory analyses. METHODS: We undertook content analysis of over 400 documents, including ERG reports and related documentation for the 100 most recent STAs (2009-2014) for which guidance has been published. Relevant data were extracted from the documents and narrative synthesis was used to summarise the extracted data. All data were extracted and checked by two researchers. RESULTS: Forty different companies submitted documents as part of the NICE STA process. The most common disease area covered by the STAs was cancer (44%), and most ERG reports (n = 93) contained at least one exploratory analysis. The incidence and frequency of ERG exploratory analyses does not appear to be related to any developments in the appraisal process, the disease area covered by the STA, or the company's base-case incremental cost-effectiveness ratio (ICER). However, there does appear to be a pattern in the mean number of analyses conducted by particular ERGs, but the reasons for this are unclear and potentially complex. CONCLUSIONS: No clear patterns were identified regarding the presence or frequency of exploratory analyses, apart from the mean number conducted by individual ERGs. More research is needed to understand this relationship
Open Problems and Fundamental Limitations of Reinforcement Learning from Human Feedback
Reinforcement learning from human feedback (RLHF) is a technique for training
AI systems to align with human goals. RLHF has emerged as the central method
used to finetune state-of-the-art large language models (LLMs). Despite this
popularity, there has been relatively little public work systematizing its
flaws. In this paper, we (1) survey open problems and fundamental limitations
of RLHF and related methods; (2) overview techniques to understand, improve,
and complement RLHF in practice; and (3) propose auditing and disclosure
standards to improve societal oversight of RLHF systems. Our work emphasizes
the limitations of RLHF and highlights the importance of a multi-faceted
approach to the development of safer AI systems