Search CORE

6 research outputs found

Context tree maximizing reinforcement learning

Author: Hutter Marcus
Nguyen Phuong
Sunehag Peter
Publication venue: 'Association for the Advancement of Artificial Intelligence (AAAI)'
Publication date: 01/07/2012
Field of study

Recent developments in reinforcement learning for nonMarkovian problems witness a surge in history-based methods, among which we are particularly interested in two frameworks, ΦMDP and MC-AIXI-CTW. ΦMDP attempts to reduce the general RL problem, where the environment’s states and dynamics are both unknown, to an MDP, while MCAIXI-CTW incrementally learns a mixture of context trees as its environment model. The main idea of ΦMDP is to connect generic reinforcement learning with classical reinforcement learning. The first implementation of ΦMDP relies on a stochastic search procedure for finding a tree that minimizes a certain cost function. This does not guarantee finding the minimizing tree, or even a good one, given limited search time. As a consequence it appears that the approach has difficulties with large domains. MC-AIXI-CTW is attractive in that it can incrementally and analytically compute the internal model through interactions with the environment. Unfortunately, it is computationally demanding due to requiring heavy planning simulations at every single time step. We devise a novel approach called CTMRL, which analytically and efficiently finds the cost-minimizing tree. Instead of the context-tree weighting method that MC-AIXI-CTW is based on, we use the closely related context-tree maximizing algorithm that selects just one single tree. This approach falls under the ΦMDP framework, which allows the replacement of the costly planning component of MC-AIXI-CTW with simple Q-Learning. Our empirical investigation shows that CTMRL finds policies of quality as good as MC-AIXI-CTW’s on six domains including a challenging Pacman domain, but in an order of magnitude less time

The Australian National University

IST Austria Technical Report

Author: Anonymous 1
Anonymous 2
Anonymous 3
Anonymous 4
Publication venue: IST Austria
Publication date: 01/01/2014
Field of study

We consider partially observable Markov decision processes (POMDPs) with a set of target states and every transition is associated with an integer cost. The optimization objective we study asks to minimize the expected total cost till the target set is reached, while ensuring that the target set is reached almost-surely (with probability 1). We show that for integer costs approximating the optimal cost is undecidable. For positive costs, our results are as follows: (i) we establish matching lower and upper bounds for the optimal cost and the bound is double exponential; (ii) we show that the problem of approximating the optimal cost is decidable and present approximation algorithms developing on the existing algorithms for POMDPs with finite-horizon objectives. While the worst-case running time of our algorithm is double exponential, we also present efficient stopping criteria for the algorithm and show experimentally that it performs well in many examples of interest

IST Austria: PubRep (Institute of Science and Technology)

Optimal Cost Almost-sure Reachability in POMDPs

Author: Chatterjee Krishnendu
Chmelík Martin
Gupta Raghav
Kanodia Ayush
Publication venue
Publication date: 14/11/2014
Field of study

arXiv.org e-Print Archive

IST Austria: PubRep (Institute of Science and Technology)

Dspace at IIT Bombay

Association for the Advancement of Artificial Intelligence: AAAI Publications

Online discovery and learning of predictive state representations

Author: Michael Bowling
Peter Mccracken
Publication venue: MIT Press
Publication date: 01/01/2005
Field of study

Predictive state representations (PSRs) are a method of modeling dynamical systems using only observable data, such as actions and observations, to describe their model. PSRs use predictions about the outcome of future tests to summarize the system state. The best existing techniques for discovery and learning of PSRs use a Monte Carlo approach to explicitly estimate these outcome probabilities. In this paper, we present a new algorithm for discovery and learning of PSRs that uses a gradient descent approach to compute the predictions for the current state. The algorithm takes advantage of the large amount of structure inherent in a valid prediction matrix to constrain its predictions. Furthermore, the algorithm can be used online by an agent to constantly improve its prediction quality; something that current state of the art discovery and learning algorithms are unable to do. We give empirical results to show that our constrained gradient algorithm is able to discover core tests using very small amounts of data, and with larger amounts of data can compute accurate predictions of the system dynamics.

CiteSeerX