Search CORE

2 research outputs found

Sorting Objects from a Conveyor Belt Using POMDPs with Multiple-Object Observations and Information-Gain Rewards

Author: Busoniu Lucian
Mezei Ady-Daniel
Tamas Levente
Publication venue: 'MDPI AG'
Publication date: 07/04/2020
Field of study

ρ-POMDPs have Lipschitz-Continuous ϵ-Optimal Value Functions

Author: Buffet Olivier
Dibangoye Jilles
Fehr Mathieu
Thomas Vincent
Publication venue: HAL CCSD
Publication date: 03/12/2018
Field of study

International audienceMany state-of-the-art algorithms for solving Partially Observable Markov Decision Processes (POMDPs) rely on turning the problem into a "fully observable" problem---a belief MDP---and exploiting the piece-wise linearity and convexity (PWLC) of the optimal value function in this new state space (the belief simplex ∆). This approach has been extended to solving ρ-POMDPs---i.e., for information-oriented criteria-when the reward ρ is convex in ∆. General ρ-POMDPs can also be turned into "fully observable" problems, but with no means to exploit the PWLC property. In this paper, we focus on POMDPs and ρ-POMDPs with λ ρ-Lipschitz reward function, and demonstrate that, for finite horizons, the optimal value function is Lipschitz-continuous. Then, value function approximators are proposed for both upper-and lower-bounding the optimal value function, which are shown to provide uniformly improvable bounds. This allows proposing two algorithms derived from HSVI which are empirically evaluated on various benchmark problems

INRIA a CCSD electronic archive server