Search CORE

4 research outputs found

Deep Variational Reinforcement Learning for POMDPs

Author: Igl Maximilian
Le Tuan Anh
Whiteson Shimon
Wood Frank
Zintgraf Luisa
Publication venue
Publication date: 01/01/2018
Field of study

Many real-world sequential decision making problems are partially observable by nature, and the environment model is typically unknown. Consequently, there is great need for reinforcement learning methods that can tackle such problems given only a stream of incomplete and noisy observations. In this paper, we propose deep variational reinforcement learning (DVRL), which introduces an inductive bias that allows an agent to learn a generative model of the environment and perform inference in that model to effectively aggregate the available information. We develop an n-step approximation to the evidence lower bound (ELBO), allowing the model to be trained jointly with the policy. This ensures that the latent state representation is suitable for the control task. In experiments on Mountain Hike and flickering Atari we show that our method outperforms previous approaches relying on recurrent neural networks to encode the past

arXiv.org e-Print Archive

Oxford University Research Archive

Fast exploration and learning of latent graphs with aliased observations

Author: Dave Meet
Deshpande Ishan
George Dileep
Lazaro-Gredilla Miguel
Swaminathan Sivaramakrishnan
Publication venue
Publication date: 25/09/2023
Field of study

We consider the problem of recovering a latent graph where the observations at each node are \emph{aliased}, and transitions are stochastic. Observations are gathered by an agent traversing the graph. Aliasing means that multiple nodes emit the same observation, so the agent can not know in which node it is located. The agent needs to uncover the hidden topology as accurately as possible and in as few steps as possible. This is equivalent to efficient recovery of the transition probabilities of a partially observable Markov decision process (POMDP) in which the observation probabilities are known. An algorithm for efficiently exploring (and ultimately recovering) the latent graph is provided. Our approach is exponentially faster than naive exploration in a variety of challenging topologies with aliased observations while remaining competitive with existing baselines in the unaliased regime

arXiv.org e-Print Archive

Dynamic-depth context tree weighting

Author: Messias JV
Whiteson SA
Publication venue: Curran Associates, Inc.
Publication date: 01/01/2018
Field of study

Reinforcement learning (RL) in partially observable settings is challenging be- cause the agent’s immediate observations are not Markov. Recently proposed methods can learn variable-order Markov models of the underlying process but have steep memory requirements and are sensitive to aliasing between observa- tion histories due to sensor noise. This paper proposes utile context tree weighting (UCTW), a model-learning method that addresses these limitations. UCTW dy- namically expands a suffix tree while ensuring that the total size of the model, but not its depth, remains bounded. We show that UCTW approximately matches the performance of state-of-the-art alternatives at stochastic time-series prediction while using at least an order of magnitude less memory. We also apply UCTW to model-based RL, showing that, on tasks that require memory of past observations, UCTW can learn without prior knowledge of a good state representation, or even the length of history upon which such a representation should depend

Oxford University Research Archive

Dynamic-depth context tree weighting

Author: Messias JV
Whiteson SA
Publication venue
Publication date: 01/06/2018
Field of study

Oxford University Research Archive