622 research outputs found
On overfitting and asymptotic bias in batch reinforcement learning with partial observability
This paper provides an analysis of the tradeoff between asymptotic bias
(suboptimality with unlimited data) and overfitting (additional suboptimality
due to limited data) in the context of reinforcement learning with partial
observability. Our theoretical analysis formally characterizes that while
potentially increasing the asymptotic bias, a smaller state representation
decreases the risk of overfitting. This analysis relies on expressing the
quality of a state representation by bounding L1 error terms of the associated
belief states. Theoretical results are empirically illustrated when the state
representation is a truncated history of observations, both on synthetic POMDPs
and on a large-scale POMDP in the context of smartgrids, with real-world data.
Finally, similarly to known results in the fully observable setting, we also
briefly discuss and empirically illustrate how using function approximators and
adapting the discount factor may enhance the tradeoff between asymptotic bias
and overfitting in the partially observable context.Comment: Accepted at the Journal of Artificial Intelligence Research (JAIR) -
31 page
Reinforcement Learning in POMDPs with Memoryless Options and Option-Observation Initiation Sets
Many real-world reinforcement learning problems have a hierarchical nature,
and often exhibit some degree of partial observability. While hierarchy and
partial observability are usually tackled separately (for instance by combining
recurrent neural networks and options), we show that addressing both problems
simultaneously is simpler and more efficient in many cases. More specifically,
we make the initiation set of options conditional on the previously-executed
option, and show that options with such Option-Observation Initiation Sets
(OOIs) are at least as expressive as Finite State Controllers (FSCs), a
state-of-the-art approach for learning in POMDPs. OOIs are easy to design based
on an intuitive description of the task, lead to explainable policies and keep
the top-level and option policies memoryless. Our experiments show that OOIs
allow agents to learn optimal policies in challenging POMDPs, while being much
more sample-efficient than a recurrent neural network over options
Learning Causal State Representations of Partially Observable Environments
Intelligent agents can cope with sensory-rich environments by learning task-agnostic state abstractions. In this paper, we propose mechanisms to approximate causal states, which optimally compress the joint history of actions and observations in partially-observable Markov decision processes. Our proposed algorithm extracts causal state representations from RNNs that are trained to predict subsequent observations given the history. We demonstrate that these learned task-agnostic state abstractions can be used to efficiently learn policies for reinforcement learning problems with rich observation spaces. We evaluate agents using multiple partially observable navigation tasks with both discrete (GridWorld) and continuous (VizDoom, ALE) observation processes that cannot be solved by traditional memory-limited methods. Our experiments demonstrate systematic improvement of the DQN and tabular models using approximate causal state representations with respect to recurrent-DQN baselines trained with raw inputs
- …