8 research outputs found
Learning to play using low-complexity rule-based policies: Illustrations through Ms. Pac-Man
In this article we propose a method that can deal with certain combinatorial reinforcement learning tasks. We demonstrate the approach in the popular Ms. Pac-Man game. We define a set of high-level observation and action modules, from which rule-based policies are constructed automatically. In these policies, actions are temporally extended, and may work concurrently. The policy of the agent is encoded by a compact decision list. The components of the list are selected from a large pool of rules, which can be either hand-crafted or generated automatically. A suitable selection of rules is learnt by the cross-entropy method, a recent global optimization algorithm that fits our framework smoothly. Cross-entropy-optimized policies perform better than our hand-crafted policy, and reach the score of average human players. We argue that learning is successful mainly because (i) policies may apply concurrent actions and thus the policy space is sufficiently rich, (ii) the search is biased towards low-complexity policies and therefore, solutions with a compact description can be found quickly if they exist
Decentralized Reinforcement Learning: Global Decision-Making via Local Economic Transactions
This paper seeks to establish a framework for directing a society of simple,
specialized, self-interested agents to solve what traditionally are posed as
monolithic single-agent sequential decision problems. What makes it challenging
to use a decentralized approach to collectively optimize a central objective is
the difficulty in characterizing the equilibrium strategy profile of
non-cooperative games. To overcome this challenge, we design a mechanism for
defining the learning environment of each agent for which we know that the
optimal solution for the global objective coincides with a Nash equilibrium
strategy profile of the agents optimizing their own local objectives. The
society functions as an economy of agents that learn the credit assignment
process itself by buying and selling to each other the right to operate on the
environment state. We derive a class of decentralized reinforcement learning
algorithms that are broadly applicable not only to standard reinforcement
learning but also for selecting options in semi-MDPs and dynamically composing
computation graphs. Lastly, we demonstrate the potential advantages of a
society's inherent modular structure for more efficient transfer learning.Comment: 18 pages, 13 figures, accepted to the International Conference on
Machine Learning (ICML) 202
Toward a Model of Mind as a Laissez-Faire Economy of Idiots
Eric B. Baum NEC Research Institute 4 Independence Way Princeton, NJ 08540 [email protected] Abstract. I argue that the mind should be viewed as an economy, and describe an algorithm that autonomously apportions complex tasks to multiple cooperating agents in such a way that the incentive of each agent is exactly to maximize my reward, as owner of the system. A specific model, called "The Hayek Machine" is proposed and tested on a simulated Blocks World (BW) planning problem. Hayek learns to solve far more complex BW problems than any previous learning algorithm. If given intermediate reward and simple features, it learns to efficiently solve arbitrary BW problems. 1 Introduction I am interested in understanding how human-like mental capabilities can arise. Any such understanding must model how large computational tasks can be broken down into smaller components, how such components can be coordinated, how the system can gain knowledge, how computations performed can be trac..