Search CORE

1 research outputs found

Active exploration by searching for experiments that falsify the computed control policy

Author: Ernst Damien
Fonteneau Raphaël
Murphy Susan
Wehenkel Louis
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/04/2011
Field of study

peer reviewedWe propose a strategy for experiment selection - in the context of reinforcement learning - based on the idea that the most interesting experiments to carry out at some stage are those that are the most liable to falsify the current hypothesis about the optimal control policy. We cast this idea in a context where a policy learning algorithm and a model identiﬁcation method are given a priori. Experiments are selected if, using the learnt environment model, they are predicted to yield a revision of the learnt control policy. Algorithms and simulation results are provided for a deterministic system with discrete action space. They show that the proposed approach is promising

Crossref

Open Repository and Bibliography - Liège