Future state maximisation as an intrinsic motivation for decision making

Charlesworth, Henry J.

research

Future state maximisation as an intrinsic motivation for decision making

Authors: Henry J. Charlesworth
Publication date
Publisher

Abstract

The concept of an “intrinsic motivation" is used in the psychology literature to distinguish between behaviour which is motivated by the expectation of an immediate, quantifiable reward (“extrinsic motivation") and behaviour which arises because it is inherently useful, interesting or enjoyable. Examples of the latter can include curiosity driven behaviour such as exploration and the accumulation of knowledge, as well as developing skills that might not be immediately useful but that have the potential to be re-used in a variety of different future situations. In this thesis, we examine a candidate for an intrinsic motivation with wide-ranging applicability which we refer to as “future state maximisation". Loosely speaking this is the idea that, taking everything else to be equal, decisions should be made so as to maximally keep one's options open, or to give the maximal amount of control over what one can potentially do in the future. Our goal is to study how this principle can be applied in a quantitative manner, as well as identifying examples of systems where doing so could be useful in either explaining or generating behaviour. We consider a number of examples, however our primary application is to a model of collective motion in which we consider a group of agents equipped with simple visual sensors, moving around in two dimensions. In this model, agents aim to make decisions about how to move so as to maximise the amount of control they have over the potential visual states that they can access in the future. We find that with each agent following this simple, low-level motivational principle a swarm spontaneously emerges in which the agents exhibit rich collective behaviour, remaining cohesive and highly-aligned. Remarkably, the emergent swarm also shares a number of features which are observed in real flocks of starlings, including scale free correlations and marginal opacity. We go on to explore how the model can be developed to allow us to manipulate and control the swarm, as well as looking at heuristics which are able to mimic future state maximisation whilst requiring significantly less computation, and so which could plausibly operate under animal cognition

Similar works

Full text

Open in the Core reader

Download PDF

Available Versions

Warwick Research Archives Portal Repository

oai:wrap.warwick.ac.uk:135026

Last time updated on 22/05/2020