We consider the problem of online planning in a Markov Decision Process when
given only access to a generative model, restricted to open-loop policies -
i.e. sequences of actions - and under budget constraint. In this setting, the
Open-Loop Optimistic Planning (OLOP) algorithm enjoys good theoretical
guarantees but is overly conservative in practice, as we show in numerical
experiments. We propose a modified version of the algorithm with tighter
upper-confidence bounds, KLOLOP, that leads to better practical performances
while retaining the sample complexity bound. Finally, we propose an efficient
implementation that significantly improves the time complexity of both
algorithms