Search CORE

49 research outputs found

Characterizing Manipulation from AI Systems

Author: Ashton Henry
Carroll Micah
Chan Alan
Krueger David
Publication venue
Publication date: 30/10/2023
Field of study

Manipulation is a common concern in many domains, such as social media, advertising, and chatbots. As AI systems mediate more of our interactions with the world, it is important to understand the degree to which AI systems might manipulate humans without the intent of the system designers. Our work clarifies challenges in defining and measuring manipulation in the context of AI systems. Firstly, we build upon prior literature on manipulation from other fields and characterize the space of possible notions of manipulation, which we find to depend upon the concepts of incentives, intent, harm, and covertness. We review proposals on how to operationalize each factor. Second, we propose a definition of manipulation based on our characterization: a system is manipulative if it acts as if it were pursuing an incentive to change a human (or another agent) intentionally and covertly. Third, we discuss the connections between manipulation and related concepts, such as deception and coercion. Finally, we contextualize our operationalization of manipulation in some applications. Our overall assessment is that while some progress has been made in defining and measuring manipulation from AI systems, many gaps remain. In the absence of a consensus definition and reliable tools for measurement, we cannot rule out the possibility that AI systems learn to manipulate humans without the intent of the system designers. We argue that such manipulation poses a significant threat to human autonomy, suggesting that precautionary actions to mitigate it are warranted.Comment: Presented at EAAMO 2023; The first two authors contributed equally; author order was decided with a coin fli

arXiv.org e-Print Archive

Engagement, User Satisfaction, and the Amplification of Divisive Content on Social Media

Author: Carroll Micah
Dragan Anca D.
Milli Smitha
Pandey Sashrika
Wang Yike
Zhao Sebastian
Publication venue
Publication date: 13/09/2023
Field of study

In a pre-registered randomized experiment, we found that, relative to a reverse-chronological baseline, Twitter's engagement-based ranking algorithm may amplify emotionally charged, out-group hostile content and contribute to affective polarization. Furthermore, we critically examine the claim that the algorithm shows users what they want to see, discovering that users do *not* prefer the political tweets selected by the algorithm. Finally, we explore the implications of an alternative approach to ranking content based on users' stated preferences and find a reduction in angry, partisan, and out-group hostile content but also a potential reinforcement of echo chambers. The evidence underscores the necessity for a more nuanced approach to content ranking that balances engagement, users' stated preferences, and sociopolitical outcomes

arXiv.org e-Print Archive

On the Utility of Learning about Humans for Human-AI Coordination

Author: Abbeel Pieter
Carroll Micah
Dragan Anca
Griffiths Thomas L.
Ho Mark K.
Seshia Sanjit A.
Shah Rohin
Publication venue
Publication date: 01/01/2019
Field of study

While we would like agents that can coordinate with humans, current algorithms such as self-play and population-based training create agents that can coordinate with themselves. Agents that assume their partner to be optimal or similar to them can converge to coordination protocols that fail to understand and be understood by humans. To demonstrate this, we introduce a simple environment that requires challenging coordination, based on the popular game Overcooked, and learn a simple model that mimics human play. We evaluate the performance of agents trained via self-play and population-based training. These agents perform very well when paired with themselves, but when paired with our human model, they are significantly worse than agents designed to play with the human model. An experiment with a planning algorithm yields the same conclusion, though only when the human-aware planner is given the exact human model that it is playing with. A user study with real humans shows this pattern as well, though less strongly. Qualitatively, we find that the gains come from having the agent adapt to the human's gameplay. Given this result, we suggest several approaches for designing agents that learn about humans in order to better coordinate with them. Code is available at https://github.com/HumanCompatibleAI/overcooked_ai.Comment: Published at NeurIPS 2019 (http://papers.nips.cc/paper/8760-on-the-utility-of-learning-about-humans-for-human-ai-coordination

arXiv.org e-Print Archive

eScholarship - University of California

UniMASK: Unified Inference in Sequential Decision Problems

Author: Bignell David
Carroll Micah
Devlin Sam
Dragan Anca
Georgescu Raluca
Hausknecht Matthew
Hofmann Katja
Lin Jessy
Milani Stephanie
Paradise Orr
Sun Mingfei
Publication venue
Publication date: 01/11/2022
Field of study

Randomly masking and predicting word tokens has been a successful approach in pre-training language models for a variety of downstream tasks. In this work, we observe that the same idea also applies naturally to sequential decision-making, where many well-studied tasks like behavior cloning, offline reinforcement learning, inverse dynamics, and waypoint conditioning correspond to different sequence maskings over a sequence of states, actions, and returns. We introduce the UniMASK framework, which provides a unified way to specify models which can be trained on many different sequential decision-making tasks. We show that a single UniMASK model is often capable of carrying out many tasks with performance similar to or better than single-task models. Additionally, after fine-tuning, our UniMASK models consistently outperform comparable single-task models. Our code is publicly available at https://github.com/micahcarroll/uniMASK.Comment: NeurIPS 2022 (Oral). A prior version was published at an ICML Workshop, available at arXiv:2204.1332

arXiv.org e-Print Archive

The University of Manchester - Institutional Repository

Issues Related to the Frequency of Exploratory Analyses by Evidence Review Groups in the NICE Single Technology Appraisal Process

Author: Carroll Christopher
Hill-McManus Daniel
Holmes Michael
Kaltenthaler Eva
Rice Stephen
Rose Micah
Scope Alison
Tappenden Paul
Woolacott Nerys
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2016
Field of study

BACKGROUND: Evidence Review Groups (ERGs) critically appraise company submissions as part of the National Institute for Health and Care Excellence (NICE) Single Technology Appraisal (STA) process. As part of their critique of the evidence submitted by companies, the ERGs undertake exploratory analyses to explore uncertainties in the company's model. The aim of this study was to explore pre-defined factors that might influence or predict the extent of ERG exploratory analyses. OBJECTIVE: The aim of this study was to explore predefined factors that might influence or predict the extent of ERG exploratory analyses. METHODS: We undertook content analysis of over 400 documents, including ERG reports and related documentation for the 100 most recent STAs (2009-2014) for which guidance has been published. Relevant data were extracted from the documents and narrative synthesis was used to summarise the extracted data. All data were extracted and checked by two researchers. RESULTS: Forty different companies submitted documents as part of the NICE STA process. The most common disease area covered by the STAs was cancer (44%), and most ERG reports (n = 93) contained at least one exploratory analysis. The incidence and frequency of ERG exploratory analyses does not appear to be related to any developments in the appraisal process, the disease area covered by the STA, or the company's base-case incremental cost-effectiveness ratio (ICER). However, there does appear to be a pattern in the mean number of analyses conducted by particular ERGs, but the reasons for this are unclear and potentially complex. CONCLUSIONS: No clear patterns were identified regarding the presence or frequency of exploratory analyses, apart from the mean number conducted by individual ERGs. More research is needed to understand this relationship

Southampton (e-Prints Soton)

Crossref

Springer - Publisher Connector

Sheffield Hallam University Research Archive

White Rose Research Online

Open Problems and Fundamental Limitations of Reinforcement Learning from Human Feedback

Reinforcement learning from human feedback (RLHF) is a technique for training AI systems to align with human goals. RLHF has emerged as the central method used to finetune state-of-the-art large language models (LLMs). Despite this popularity, there has been relatively little public work systematizing its flaws. In this paper, we (1) survey open problems and fundamental limitations of RLHF and related methods; (2) overview techniques to understand, improve, and complement RLHF in practice; and (3) propose auditing and disclosure standards to improve societal oversight of RLHF systems. Our work emphasizes the limitations of RLHF and highlights the importance of a multi-faceted approach to the development of safer AI systems

arXiv.org e-Print Archive

On the Utility of Learning about Humans for Human-AI Coordination.

Author: Carroll Micah,
Publication venue
Publication date: 06/01/2021
Field of study

Ezid

Techxpo 2015 - Montana Tech

Author: Carroll Kristi
Gjeltema Micah
Juskiewicz Scott
Publication venue: Digital Commons @ Montana Tech
Publication date: 01/04/2015
Field of study

Digital Commons @ Montana Tech