Search CORE

9 research outputs found

Market-based reinforcement learning in partially observable worlds

Author: Hutter Marcus
Kwee Ivo
Schmidhuber Jürgen
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 14/06/2016
Field of study

Unlike traditional reinforcement learning (RL), market-based RL is in principle applicable to worlds described by partially observable Markov Decision Processes (POMDPs), where an agent needs to learn short-term memories of relevant previous events in order to execute optimal actions. Most previous work, however, has focused on reactive settings (MDPs) instead of POMDPs. Here we reimplement a recent approach to market-based RL and for the first time evaluate it in a toy POMDP setting.This work was supported by SNF grants 21-55409.98 and 2000-61847.0

The Australian National University

Smooth markets: A basic mechanism for organizing gradient-based learners

Author: Anthony Thomas W
Balduzzi David
Czarnecki Wojciech M
Gemp Ian M
Graepel Thore
Hughes Edward
Leibo Joel Z
Piliouras Georgios
Publication venue
Publication date: 01/01/2020
Field of study

With the success of modern machine learning, it is becoming increasingly important to understand and control how learning algorithms interact. Unfortunately, negative results from game theory show there is little hope of understanding or controlling general n-player games. We therefore introduce smooth markets (SM-games), a class of n-player games with pairwise zero sum interactions. SM-games codify a common design pattern in machine learning that includes (some) GANs, adversarial training, and other recent algorithms. We show that SM-games are amenable to analysis and optimization using first-order methods.Comment: 18 pages, 3 figure

arXiv.org e-Print Archive

UCL Discovery

Decentralized Reinforcement Learning: Global Decision-Making via Local Economic Transactions

Author: Chang Michael
Griffiths Thomas L.
Kaushik Sidhant
Levine Sergey
Weinberg S. Matthew
Publication venue
Publication date: 01/01/2020
Field of study

This paper seeks to establish a framework for directing a society of simple, specialized, self-interested agents to solve what traditionally are posed as monolithic single-agent sequential decision problems. What makes it challenging to use a decentralized approach to collectively optimize a central objective is the difficulty in characterizing the equilibrium strategy profile of non-cooperative games. To overcome this challenge, we design a mechanism for defining the learning environment of each agent for which we know that the optimal solution for the global objective coincides with a Nash equilibrium strategy profile of the agents optimizing their own local objectives. The society functions as an economy of agents that learn the credit assignment process itself by buying and selling to each other the right to operate on the environment state. We derive a class of decentralized reinforcement learning algorithms that are broadly applicable not only to standard reinforcement learning but also for selecting options in semi-MDPs and dynamically composing computation graphs. Lastly, we demonstrate the potential advantages of a society's inherent modular structure for more efficient transfer learning.Comment: 18 pages, 13 figures, accepted to the International Conference on Machine Learning (ICML) 202

arXiv.org e-Print Archive

Princeton University Open Access Repository

Universal Algorithmic Intelligence: A mathematical top->down approach

Author: Hutter Marcus
Publication venue
Publication date: 01/01/2007
Field of study

Sequential decision theory formally solves the problem of rational agents in uncertain worlds if the true environmental prior probability distribution is known. Solomonoff's theory of universal induction formally solves the problem of sequence prediction for unknown prior distribution. We combine both ideas and get a parameter-free theory of universal Artificial Intelligence. We give strong arguments that the resulting AIXI model is the most intelligent unbiased agent possible. We outline how the AIXI model can formally solve a number of problem classes, including sequence prediction, strategic games, function minimization, reinforcement and supervised learning. The major drawback of the AIXI model is that it is uncomputable. To overcome this problem, we construct a modified algorithm AIXItl that is still effectively more intelligent than any other time t and length l bounded agent. The computation time of AIXItl is of the order t x 2^l. The discussion includes formal definitions of intelligence order relations, the horizon problem and relations of the AIXI theory to other AI approaches.Comment: 70 page

arXiv.org e-Print Archive

The Australian National University

Policy-Gradient Algorithms for Partially Observable Markov Decision Processes

Author: Aberdeen Douglas
Publication venue
Publication date: 01/01/2003
Field of study

Partially observable Markov decision processes are interesting because of their ability to model most conceivable real-world learning problems, for example, robot navigation, driving a car, speech recognition, stock trading, and playing games. The downside of this generality is that exact algorithms are computationally intractable. Such computational complexity motivates approximate approaches. One such class of algorithms are the so-called policy-gradient methods from reinforcement learning. They seek to adjust the parameters of an agent in the direction that maximises the long-term average of a reward signal. Policy-gradient methods are attractive as a \emph{scalable} approach for controlling partially observable Markov decision processes (POMDPs). In the most general case POMDP policies require some form of internal state, or memory, in order to act optimally. Policy-gradient methods have shown promise for problems admitting memory-less policies but have been less successful when memory is required. This thesis develops several improved algorithms for learning policies with memory in an infinite-horizon setting. Directly, when the dynamics of the world are known, and via Monte-Carlo methods otherwise. The algorithms simultaneously learn how to act and what to remember. ..

The Australian National University

Policy-Gradient Algorithms for Partially Observable Markov Decision Processes

Author: Aberdeen Douglas
Publication venue
Publication date: 01/01/2003
Field of study

The Australian National University

National Taiwan University Repository