Search CORE

8,079 research outputs found

Memory Bounded Open-Loop Planning in Large POMDPs using Thompson Sampling

Author: Belzner Lenz
Friedrich Markus
Kiermeier Marie
Linnhoff-Popien Claudia
Phan Thomy
Schmid Kyrill
Publication venue
Publication date: 10/05/2019
Field of study

State-of-the-art approaches to partially observable planning like POMCP are based on stochastic tree search. While these approaches are computationally efficient, they may still construct search trees of considerable size, which could limit the performance due to restricted memory resources. In this paper, we propose Partially Observable Stacked Thompson Sampling (POSTS), a memory bounded approach to open-loop planning in large POMDPs, which optimizes a fixed size stack of Thompson Sampling bandits. We empirically evaluate POSTS in four large benchmark problems and compare its performance with different tree-based approaches. We show that POSTS achieves competitive performance compared to tree-based open-loop planning and offers a performance-memory tradeoff, making it suitable for partially observable planning with highly restricted computational and memory resources.Comment: Presented at AAAI 201

arXiv.org e-Print Archive

Association for the Advancement of Artificial Intelligence: AAAI Publications

Near-Optimal BRL using Optimistic Local Transitions

Author: Araya Mauricio
Buffet Olivier
Thomas Vincent
Publication venue
Publication date: 18/06/2012
Field of study

Model-based Bayesian Reinforcement Learning (BRL) allows a found formalization of the problem of acting optimally while facing an unknown environment, i.e., avoiding the exploration-exploitation dilemma. However, algorithms explicitly addressing BRL suffer from such a combinatorial explosion that a large body of work relies on heuristic algorithms. This paper introduces BOLT, a simple and (almost) deterministic heuristic algorithm for BRL which is optimistic about the transition function. We analyze BOLT's sample complexity, and show that under certain parameters, the algorithm is near-optimal in the Bayesian sense with high probability. Then, experimental results highlight the key differences of this method compared to previous work.Comment: ICML201

arXiv.org e-Print Archive

INRIA a CCSD electronic archive server

HAL-Rennes 1

Online Regret Bounds for Undiscounted Continuous Reinforcement Learning

Author: Ortner Ronald
Ryabko Daniil
Publication venue
Publication date: 01/01/2012
Field of study

We derive sublinear regret bounds for undiscounted reinforcement learning in continuous state space. The proposed algorithm combines state aggregation with the use of upper confidence bounds for implementing optimism in the face of uncertainty. Beside the existence of an optimal policy which satisfies the Poisson equation, the only assumptions made are Holder continuity of rewards and transition probabilities

arXiv.org e-Print Archive

CiteSeerX

HAL - Lille 3

INRIA a CCSD electronic archive server