1,023 research outputs found
Efficient Reinforcement Learning in Factored MDPs
We present a provably efficient and near-optimal algorithm for reinforcement learning in Markov decision processes (MDPs) whose transition model can be factored as a dynamic Bayesian network (DBN). Our algorithm generalizes the recent E 3 algorithm of Kearns and Singh, and assumes that we are given both an algorithm for approximate planning and the graphical structure (but not the parameters) of the DBN. Unlike the original E 3 algorithm, our new algorithm exploits the DBN structure to achieve a running time that scales polynomially in the number of parameters of the DBN, which may be exponentially smaller than the number of global states.
Feature Dynamic Bayesian Networks
Feature Markov Decision Processes (PhiMDPs) are well-suited for learning
agents in general environments. Nevertheless, unstructured (Phi)MDPs are
limited to relatively simple environments. Structured MDPs like Dynamic
Bayesian Networks (DBNs) are used for large-scale real-world problems. In this
article I extend PhiMDP to PhiDBN. The primary contribution is to derive a cost
criterion that allows to automatically extract the most relevant features from
the environment, leading to the "best" DBN representation. I discuss all
building blocks required for a complete general learning algorithm.Comment: 7 page
- …