122 research outputs found
Feature Markov Decision Processes
General purpose intelligent learning agents cycle through (complex,non-MDP)
sequences of observations, actions, and rewards. On the other hand,
reinforcement learning is well-developed for small finite state Markov Decision
Processes (MDPs). So far it is an art performed by human designers to extract
the right state representation out of the bare observations, i.e. to reduce the
agent setup to the MDP framework. Before we can think of mechanizing this
search for suitable MDPs, we need a formal objective criterion. The main
contribution of this article is to develop such a criterion. I also integrate
the various parts into one learning algorithm. Extensions to more realistic
dynamic Bayesian networks are developed in a companion article.Comment: 7 page
Feature Dynamic Bayesian Networks
Feature Markov Decision Processes (PhiMDPs) are well-suited for learning
agents in general environments. Nevertheless, unstructured (Phi)MDPs are
limited to relatively simple environments. Structured MDPs like Dynamic
Bayesian Networks (DBNs) are used for large-scale real-world problems. In this
article I extend PhiMDP to PhiDBN. The primary contribution is to derive a cost
criterion that allows to automatically extract the most relevant features from
the environment, leading to the "best" DBN representation. I discuss all
building blocks required for a complete general learning algorithm.Comment: 7 page
Hilbert Space Embeddings of POMDPs
A nonparametric approach for policy learning for POMDPs is proposed. The
approach represents distributions over the states, observations, and actions as
embeddings in feature spaces, which are reproducing kernel Hilbert spaces.
Distributions over states given the observations are obtained by applying the
kernel Bayes' rule to these distribution embeddings. Policies and value
functions are defined on the feature space over states, which leads to a
feature space expression for the Bellman equation. Value iteration may then be
used to estimate the optimal value function and associated policy. Experimental
results confirm that the correct policy is learned using the feature space
representation.Comment: Appears in Proceedings of the Twenty-Eighth Conference on Uncertainty
in Artificial Intelligence (UAI2012
Memory Bounded Open-Loop Planning in Large POMDPs using Thompson Sampling
State-of-the-art approaches to partially observable planning like POMCP are
based on stochastic tree search. While these approaches are computationally
efficient, they may still construct search trees of considerable size, which
could limit the performance due to restricted memory resources. In this paper,
we propose Partially Observable Stacked Thompson Sampling (POSTS), a memory
bounded approach to open-loop planning in large POMDPs, which optimizes a fixed
size stack of Thompson Sampling bandits. We empirically evaluate POSTS in four
large benchmark problems and compare its performance with different tree-based
approaches. We show that POSTS achieves competitive performance compared to
tree-based open-loop planning and offers a performance-memory tradeoff, making
it suitable for partially observable planning with highly restricted
computational and memory resources.Comment: Presented at AAAI 201
- …