138 research outputs found
Planning in Partially Observable Domains with Fuzzy Epistemic States and Probabilistic Dynamics
International audienceA new translation from Partially Observable MDP into Fully Observable MDP is described here. Unlike the classical translation, the resulting problem state space is finite, making MDP solvers able to solve this simplified version of the initial partially observable problem: this approach encodes agent beliefs with possibility distributions over states, leading to an MDP whose state space is a finite set of epistemic states. After a short description of the POMDP framework as well as notions of Possibility Theory, the translation is described in a formal manner with semantic arguments. Then actual computations of this transformation are detailed, in order to highly benefit from the factored structure of the initial POMDP in the final MDP size reduction and structure. Finally size reduction and tractability of the resulting MDP is illustrated on a simple POMDP problem
Deep Variational Reinforcement Learning for POMDPs
Many real-world sequential decision making problems are partially observable
by nature, and the environment model is typically unknown. Consequently, there
is great need for reinforcement learning methods that can tackle such problems
given only a stream of incomplete and noisy observations. In this paper, we
propose deep variational reinforcement learning (DVRL), which introduces an
inductive bias that allows an agent to learn a generative model of the
environment and perform inference in that model to effectively aggregate the
available information. We develop an n-step approximation to the evidence lower
bound (ELBO), allowing the model to be trained jointly with the policy. This
ensures that the latent state representation is suitable for the control task.
In experiments on Mountain Hike and flickering Atari we show that our method
outperforms previous approaches relying on recurrent neural networks to encode
the past
Influence of State-Variable Constraints on Partially Observable Monte Carlo Planning
Online planning methods for partially observable Markov decision processes (POMDPs) have re- cently gained much interest. In this paper, we pro- pose the introduction of prior knowledge in the form of (probabilistic) relationships among dis- crete state-variables, for online planning based on the well-known POMCP algorithm. In particu- lar, we propose the use of hard constraint net- works and probabilistic Markov random fields to formalize state-variable constraints and we extend the POMCP algorithm to take advantage of these constraints. Results on a case study based on Rock- sample show that the usage of this knowledge pro- vides significant improvements to the performance of the algorithm. The extent of this improvement depends on the amount of knowledge encoded in the constraints and reaches the 50% of the average discounted return in the most favorable cases that we analyzed
- …