Search CORE

81 research outputs found

Learning the Preferences of Ignorant, Inconsistent Agents

Author: Evans Owain
Goodman Noah D.
Stuhlmueller Andreas
Publication venue
Publication date: 17/12/2015
Field of study

An important use of machine learning is to learn what people value. What posts or photos should a user be shown? Which jobs or activities would a person find rewarding? In each case, observations of people's past choices can inform our inferences about their likes and preferences. If we assume that choices are approximately optimal according to some utility function, we can treat preference inference as Bayesian inverse planning. That is, given a prior on utility functions and some observed choices, we invert an optimal decision-making process to infer a posterior distribution on utility functions. However, people often deviate from approximate optimality. They have false beliefs, their planning is sub-optimal, and their choices may be temporally inconsistent due to hyperbolic discounting and other biases. We demonstrate how to incorporate these deviations into algorithms for preference inference by constructing generative models of planning for agents who are subject to false beliefs and time inconsistency. We explore the inferences these models make about preferences, beliefs, and biases. We present a behavioral experiment in which human subjects perform preference inference given the same observations of choices as our model. Results show that human subjects (like our model) explain choices in terms of systematic deviations from optimal behavior and suggest that they take such deviations into account when inferring preferences.Comment: AAAI 201

arXiv.org e-Print Archive

Association for the Advancement of Artificial Intelligence: AAAI Publications

Trial without Error: Towards Safe Reinforcement Learning via Human Intervention

Author: Evans Owain
Sastry Girish
Saunders William
Stuhlmueller Andreas
Publication venue
Publication date: 17/07/2017
Field of study

AI systems are increasingly applied to complex tasks that involve interaction with humans. During training, such systems are potentially dangerous, as they haven't yet learned to avoid actions that could cause serious harm. How can an AI system explore and learn without making a single mistake that harms humans or otherwise causes serious damage? For model-free reinforcement learning, having a human "in the loop" and ready to intervene is currently the only way to prevent all catastrophes. We formalize human intervention for RL and show how to reduce the human labor required by training a supervised learner to imitate the human's intervention decisions. We evaluate this scheme on Atari games, with a Deep RL agent being overseen by a human for four hours. When the class of catastrophes is simple, we are able to prevent all catastrophes without affecting the agent's learning (whereas an RL baseline fails due to catastrophic forgetting). However, this scheme is less successful when catastrophes are more complex: it reduces but does not eliminate catastrophes and the supervised learner fails on adversarial examples found by the agent. Extrapolating to more challenging environments, we show that our implementation would not scale (due to the infeasible amount of human labor required). We outline extensions of the scheme that are necessary if we are to train model-free agents without a single catastrophe

arXiv.org e-Print Archive

Oxford University Research Archive

"So, Tell Me What Users Want, What They Really, Really Want!"

Author: Cook R.
Evans Owain
Kahneman Daniel
Kay Judy
Ng Andrew
Parfit Derek
Publication venue
Publication date: 01/01/2018
Field of study

Equating users' true needs and desires with behavioural measures of 'engagement' is problematic. However, good metrics of 'true preferences' are difficult to define, as cognitive biases make people's preferences change with context and exhibit inconsistencies over time. Yet, HCI research often glosses over the philosophical and theoretical depth of what it means to infer what users really want. In this paper, we present an alternative yet very real discussion of this issue, via a fictive dialogue between senior executives in a tech company aimed at helping people live the life they `really' want to live. How will the designers settle on a metric for their product to optimise

arXiv.org e-Print Archive

Crossref

Oxford University Research Archive

Learning Structured Preferences

Author: Bergen Leon
Evans Owain Rhys
Tenenbaum Joshua B
Publication venue: Cognitive Science Society
Publication date: 08/12/2017
Field of study

Learning the preferences of other people is crucial for predict- ing future behavior. Both children and adults make inferences about others’ preferences from sparse data and in situations where the preferences have complex internal structures. We present a computational model of learning structured prefer- ences which integrates Bayesian inference and utility-based models of preference from economics. We experimentally test this model with adult participants, and compare the model to alternative heuristic models

DSpace@MIT

The Reversal Curse: LLMs trained on "A is B" fail to learn "B is A"

Author: Balesni Mikita
Berglund Lukas
Evans Owain
Kaufmann Max
Korbak Tomasz
Stickland Asa Cooper
Tong Meg
Publication venue
Publication date: 22/09/2023
Field of study

We expose a surprising failure of generalization in auto-regressive large language models (LLMs). If a model is trained on a sentence of the form "A is B", it will not automatically generalize to the reverse direction "B is A". This is the Reversal Curse. For instance, if a model is trained on "Olaf Scholz was the ninth Chancellor of Germany", it will not automatically be able to answer the question, "Who was the ninth Chancellor of Germany?". Moreover, the likelihood of the correct answer ("Olaf Scholz") will not be higher than for a random name. Thus, models exhibit a basic failure of logical deduction and do not generalize a prevalent pattern in their training set (i.e. if "A is B'' occurs, "B is A" is more likely to occur). We provide evidence for the Reversal Curse by finetuning GPT-3 and Llama-1 on fictitious statements such as "Uriah Hawthorne is the composer of 'Abyssal Melodies'" and showing that they fail to correctly answer "Who composed 'Abyssal Melodies?'". The Reversal Curse is robust across model sizes and model families and is not alleviated by data augmentation. We also evaluate ChatGPT (GPT-3.5 and GPT-4) on questions about real-world celebrities, such as "Who is Tom Cruise's mother? [A: Mary Lee Pfeiffer]" and the reverse "Who is Mary Lee Pfeiffer's son?". GPT-4 correctly answers questions like the former 79% of the time, compared to 33% for the latter. This shows a failure of logical deduction that we hypothesize is caused by the Reversal Curse. Code is available at https://github.com/lukasberglund/reversal_curse.Comment: 18 pages, 10 figure

arXiv.org e-Print Archive

Complement activation and increased anaphylatoxin receptor expression are associated with cortical grey matter lesions and the compartmentalised inflammatory response of multiple sclerosis

Author: Constantinos Demetriou
Gabriella Santiago
James Neal
Kristen Hawkins
Lewis Watkins
Owain Howell
Rhian Evans
Publication venue: Frontiers Media SA
Publication date
Field of study

Background: The extent of cortical pathology is an important determinant of multiple sclerosis (MS) severity. Cortical demyelination and neurodegeneration are related to inflammation of the overlying leptomeninges, a more inflammatory CSF milieu and with parenchymal microglia and astroglia activation. These are all components of the compartmentalised inflammatory response. Compartmentalised inflammation is a feature of progressive MS, which is not targeted by disease modifying therapies. Complement is differentially expressed in the MS CSF and complement, and complement receptors, are associated with demyelination and neurodegeneration. Methods: To better understand if complement activation in the leptomeninges is associated with underlying cortical demyelination, inflammation, and microglial activation, we performed a neuropathological study of progressive MS (n = 22, 14 females), neuroinflammatory (n = 8), and non-neurological disease controls (n = 10). We then quantified the relative extent of demyelination, connective tissue inflammation, complement, and complement receptor positive microglia/macrophages. Results: Complement was elevated at the leptomeninges, subpial, and within and around vessels of the cortical grey matter. The extent of complement C1q immunoreactivity correlated with connective tissue infiltrates, whilst activation products C4d, Bb, and C3b associated with grey matter demyelination, and C3a receptor 1+ and C5a receptor 1+ microglia/macrophages closely apposed C3b labelled cells. The density of C3a receptor 1+ and C5a receptor 1+ cells was increased at the expanding edge of subpial and leukocortical lesions. C5a receptor 1+ cells expressed TNFα, iNOS and contained puncta immunoreactive for proteolipid protein, neurofilament and synaptophysin, suggesting their involvement in grey matter lesion expansion. Interpretation: The presence of products of complement activation at the brain surfaces, their association with the extent of underlying pathology and increased complement anaphylatoxin receptor positive microglia/macrophages at expanding cortical grey matter lesions, could represent a target to modify compartmentalised inflammation and cortical demyelination

Cronfa at Swansea University

Help or hinder: Bayesian models of social goal inference

Author: Baker Christopher Lawrence
Evans Owain Rhys
Goodman Noah D.
Macindoe Owen
Tenenbaum Joshua B.
Ullman Tomer David
Publication venue: Neural Information Processing Systems Foundation
Publication date: 01/12/2009
Field of study

Everyday social interactions are heavily influenced by our snap judgments about others’ goals. Even young infants can infer the goals of intentional agents from observing how they interact with objects and other agents in their environment: e.g., that one agent is ‘helping’ or ‘hindering’ another’s attempt to get up a hill or open a box. We propose a model for how people can infer these social goals from actions, based on inverse planning in multiagent Markov decision problems (MDPs). The model infers the goal most likely to be driving an agent’s behavior by assuming the agent acts approximately rationally given environmental constraints and its model of other agents present. We also present behavioral evidence in support of this model over a simpler, perceptual cue-based alternative.United States. Army Research Office (ARO MURI grant W911NF-08-1-0242)United States. Air Force Office of Scientific Research (MURI grant FA9550-07-1-0075)National Science Foundation (U.S.) (Graduate Research Fellowship)James S. McDonnell Foundation (Collaborative Interdisciplinary Grant on Causal Reasoning

CiteSeerX

DSpace@MIT