Search CORE

122 research outputs found

Feature Markov Decision Processes

Author: Hutter Marcus
Publication venue
Publication date: 24/12/2008
Field of study

General purpose intelligent learning agents cycle through (complex,non-MDP) sequences of observations, actions, and rewards. On the other hand, reinforcement learning is well-developed for small finite state Markov Decision Processes (MDPs). So far it is an art performed by human designers to extract the right state representation out of the bare observations, i.e. to reduce the agent setup to the MDP framework. Before we can think of mechanizing this search for suitable MDPs, we need a formal objective criterion. The main contribution of this article is to develop such a criterion. I also integrate the various parts into one learning algorithm. Extensions to more realistic dynamic Bayesian networks are developed in a companion article.Comment: 7 page

arXiv.org e-Print Archive

CiteSeerX

Crossref

The Australian National University

Feature Dynamic Bayesian Networks

Author: Hutter Marcus
Publication venue
Publication date: 24/12/2008
Field of study

Feature Markov Decision Processes (PhiMDPs) are well-suited for learning agents in general environments. Nevertheless, unstructured (Phi)MDPs are limited to relatively simple environments. Structured MDPs like Dynamic Bayesian Networks (DBNs) are used for large-scale real-world problems. In this article I extend PhiMDP to PhiDBN. The primary contribution is to derive a cost criterion that allows to automatically extract the most relevant features from the environment, leading to the "best" DBN representation. I discuss all building blocks required for a complete general learning algorithm.Comment: 7 page

arXiv.org e-Print Archive

CiteSeerX

Crossref

The Australian National University

Hilbert Space Embeddings of POMDPs

Author: Boularias Abdeslam
Fukumizu Kenji
Gretton Arthur
Nishiyama Yu
Publication venue
Publication date: 01/01/2012
Field of study

A nonparametric approach for policy learning for POMDPs is proposed. The approach represents distributions over the states, observations, and actions as embeddings in feature spaces, which are reproducing kernel Hilbert spaces. Distributions over states given the observations are obtained by applying the kernel Bayes' rule to these distribution embeddings. Policies and value functions are defined on the feature space over states, which leads to a feature space expression for the Bellman equation. Value iteration may then be used to estimate the optimal value function and associated policy. Experimental results confirm that the correct policy is learned using the feature space representation.Comment: Appears in Proceedings of the Twenty-Eighth Conference on Uncertainty in Artificial Intelligence (UAI2012

arXiv.org e-Print Archive

UCL Discovery

MPG.PuRe

Memory Bounded Open-Loop Planning in Large POMDPs using Thompson Sampling

Author: Belzner Lenz
Friedrich Markus
Kiermeier Marie
Linnhoff-Popien Claudia
Phan Thomy
Schmid Kyrill
Publication venue
Publication date: 10/05/2019
Field of study

State-of-the-art approaches to partially observable planning like POMCP are based on stochastic tree search. While these approaches are computationally efficient, they may still construct search trees of considerable size, which could limit the performance due to restricted memory resources. In this paper, we propose Partially Observable Stacked Thompson Sampling (POSTS), a memory bounded approach to open-loop planning in large POMDPs, which optimizes a fixed size stack of Thompson Sampling bandits. We empirically evaluate POSTS in four large benchmark problems and compare its performance with different tree-based approaches. We show that POSTS achieves competitive performance compared to tree-based open-loop planning and offers a performance-memory tradeoff, making it suitable for partially observable planning with highly restricted computational and memory resources.Comment: Presented at AAAI 201

arXiv.org e-Print Archive

Association for the Advancement of Artificial Intelligence: AAAI Publications