8 research outputs found

    Learning to play using low-complexity rule-based policies: Illustrations through Ms. Pac-Man

    Get PDF
    In this article we propose a method that can deal with certain combinatorial reinforcement learning tasks. We demonstrate the approach in the popular Ms. Pac-Man game. We define a set of high-level observation and action modules, from which rule-based policies are constructed automatically. In these policies, actions are temporally extended, and may work concurrently. The policy of the agent is encoded by a compact decision list. The components of the list are selected from a large pool of rules, which can be either hand-crafted or generated automatically. A suitable selection of rules is learnt by the cross-entropy method, a recent global optimization algorithm that fits our framework smoothly. Cross-entropy-optimized policies perform better than our hand-crafted policy, and reach the score of average human players. We argue that learning is successful mainly because (i) policies may apply concurrent actions and thus the policy space is sufficiently rich, (ii) the search is biased towards low-complexity policies and therefore, solutions with a compact description can be found quickly if they exist

    Decentralized Reinforcement Learning: Global Decision-Making via Local Economic Transactions

    Full text link
    This paper seeks to establish a framework for directing a society of simple, specialized, self-interested agents to solve what traditionally are posed as monolithic single-agent sequential decision problems. What makes it challenging to use a decentralized approach to collectively optimize a central objective is the difficulty in characterizing the equilibrium strategy profile of non-cooperative games. To overcome this challenge, we design a mechanism for defining the learning environment of each agent for which we know that the optimal solution for the global objective coincides with a Nash equilibrium strategy profile of the agents optimizing their own local objectives. The society functions as an economy of agents that learn the credit assignment process itself by buying and selling to each other the right to operate on the environment state. We derive a class of decentralized reinforcement learning algorithms that are broadly applicable not only to standard reinforcement learning but also for selecting options in semi-MDPs and dynamically composing computation graphs. Lastly, we demonstrate the potential advantages of a society's inherent modular structure for more efficient transfer learning.Comment: 18 pages, 13 figures, accepted to the International Conference on Machine Learning (ICML) 202

    Toward a Model of Mind as a Laissez-Faire Economy of Idiots

    No full text
    Eric B. Baum NEC Research Institute 4 Independence Way Princeton, NJ 08540 [email protected] Abstract. I argue that the mind should be viewed as an economy, and describe an algorithm that autonomously apportions complex tasks to multiple cooperating agents in such a way that the incentive of each agent is exactly to maximize my reward, as owner of the system. A specific model, called "The Hayek Machine" is proposed and tested on a simulated Blocks World (BW) planning problem. Hayek learns to solve far more complex BW problems than any previous learning algorithm. If given intermediate reward and simple features, it learns to efficiently solve arbitrary BW problems. 1 Introduction I am interested in understanding how human-like mental capabilities can arise. Any such understanding must model how large computational tasks can be broken down into smaller components, how such components can be coordinated, how the system can gain knowledge, how computations performed can be trac..
    corecore