    Least-squares methods for policy iteration

    Approximate reinforcement learning deals with the essential problem of applying reinforcement learning in large and continuous state-action spaces, by using function approximators to represent the solution. This chapter reviews least-squares methods for policy iteration, an important class of algorithms for approximate reinforcement learning. We discuss three techniques for solving the core, policy evaluation component of policy iteration, called: least-squares temporal difference, least-squares policy evaluation, and Bellman residual minimization. We introduce these techniques starting from their general mathematical principles and detailing them down to fully specified algorithms. We pay attention to online variants of policy iteration, and provide a numerical example highlighting the behavior of representative offline and online methods. For the policy evaluation component as well as for the overall resulting approximate policy iteration, we provide guarantees on the performance obtained asymptotically, as the number of samples processed and iterations executed grows to infinity. We also provide finite-sample results, which apply when a finite number of samples and iterations are considered. Finally, we outline several extensions and improvements to the techniques and methods reviewed

    Learning by observation using Qualitative Spatial Relations

    We present an approach to the problem of learning by observation in spatially-situated tasks, whereby an agent learns to imitate the behaviour of an observed expert, with no direct interaction and limited observations. The form of knowledge representation used for these observations is crucial, and we apply Qualitative Spatial-Relational representations to compress continuous, metric state-spaces into symbolic states to maximise the generalisability of learned models and minimise knowledge engineering. Our system self-configures these representations of the world to discover configurations of features most relevant to the task, and thus build good predictive models. We then show how these models can be employed by situated agents to control their behaviour, closing the loop from observation to practical implementation. We evaluate our approach in the simulated RoboCup Soccer domain and the Real-Time Strategy game Starcraft, and successfully demonstrate how a system using our approach closely mimics the behaviour of both synthetic (AI controlled) players, and also human-controlled players through observation. We further evaluate our work in Reinforcement Learning tasks in these domains, and show that our approach improves the speed at which such models can be learned

    Topology based global crowd control

    We propose a method to determine the flow of large crowds of agents in a scene such that it is filled to its capacity with a coordinated, dynamically moving crowd. Our approach provides a focus on cooperative control across the entire crowd. This is done with a view to providing a method which animators can use to easily populate and fill a scene. We solve this global planning problem by first finding the topology of the scene using a Reeb graph, which is computed from a Harmonic field of the environment. The Maximum flow can then be calculated across this graph detailing how the agents should move through the space. This information is converted back from the topological level to the geometric using a route planner and the Harmonic field. We provide evidence of the system’s effectiveness in creating dynamic motion through comparison to a recent method. We also demonstrate how this system allows the crowd to be controlled globally with a couple of simple intuitive controls and how it can be useful for the purpose of designing buildings and providing control in team sports