Search CORE

8 research outputs found

Learning to play using low-complexity rule-based policies: Illustrations through Ms. Pac-Man

Author: Lorincz A
Szita I
Publication venue
Publication date: 01/01/2007
Field of study

In this article we propose a method that can deal with certain combinatorial reinforcement learning tasks. We demonstrate the approach in the popular Ms. Pac-Man game. We define a set of high-level observation and action modules, from which rule-based policies are constructed automatically. In these policies, actions are temporally extended, and may work concurrently. The policy of the agent is encoded by a compact decision list. The components of the list are selected from a large pool of rules, which can be either hand-crafted or generated automatically. A suitable selection of rules is learnt by the cross-entropy method, a recent global optimization algorithm that fits our framework smoothly. Cross-entropy-optimized policies perform better than our hand-crafted policy, and reach the score of average human players. We argue that learning is successful mainly because (i) policies may apply concurrent actions and thus the policy space is sufficiently rich, (ii) the search is biased towards low-complexity policies and therefore, solutions with a compact description can be found quickly if they exist

CiteSeerX

ELTE Digital Institutional Repository (EDIT)

Decentralized Reinforcement Learning: Global Decision-Making via Local Economic Transactions

Author: Chang Michael
Griffiths Thomas L.
Kaushik Sidhant
Levine Sergey
Weinberg S. Matthew
Publication venue
Publication date: 01/01/2020
Field of study

This paper seeks to establish a framework for directing a society of simple, specialized, self-interested agents to solve what traditionally are posed as monolithic single-agent sequential decision problems. What makes it challenging to use a decentralized approach to collectively optimize a central objective is the difficulty in characterizing the equilibrium strategy profile of non-cooperative games. To overcome this challenge, we design a mechanism for defining the learning environment of each agent for which we know that the optimal solution for the global objective coincides with a Nash equilibrium strategy profile of the agents optimizing their own local objectives. The society functions as an economy of agents that learn the credit assignment process itself by buying and selling to each other the right to operate on the environment state. We derive a class of decentralized reinforcement learning algorithms that are broadly applicable not only to standard reinforcement learning but also for selecting options in semi-MDPs and dynamically composing computation graphs. Lastly, we demonstrate the potential advantages of a society's inherent modular structure for more efficient transfer learning.Comment: 18 pages, 13 figures, accepted to the International Conference on Machine Learning (ICML) 202

arXiv.org e-Print Archive

Princeton University Open Access Repository

Toward a Model of Mind as a Laissez-Faire Economy of Idiots

Author: Eric B. Baum
Publication venue: Morgan
Publication date
Field of study

Eric B. Baum NEC Research Institute 4 Independence Way Princeton, NJ 08540 [email protected] Abstract. I argue that the mind should be viewed as an economy, and describe an algorithm that autonomously apportions complex tasks to multiple cooperating agents in such a way that the incentive of each agent is exactly to maximize my reward, as owner of the system. A specific model, called "The Hayek Machine" is proposed and tested on a simulated Blocks World (BW) planning problem. Hayek learns to solve far more complex BW problems than any previous learning algorithm. If given intermediate reward and simple features, it learns to efficiently solve arbitrary BW problems. 1 Introduction I am interested in understanding how human-like mental capabilities can arise. Any such understanding must model how large computational tasks can be broken down into smaller components, how such components can be coordinated, how the system can gain knowledge, how computations performed can be trac..

CiteSeerX

The robot, a stranger to ethics

Author: Ruffo Marie Des Neiges
Publication venue
Publication date: 01/01/2012
Field of study

Repository of the University of Namur

The robot, a stranger to ethics

Author: Ruffo Marie Des Neiges
Publication venue
Publication date: 01/01/2012
Field of study

Repository of the University of Namur