Interactive Q-Learning with Ordinal Rewards and Unreliable Tutor

Eyke Hüllermeier; Paul Weng; Robert Busa-fekete

Interactive Q-Learning with Ordinal Rewards and Unreliable Tutor

Authors: Eyke Hüllermeier
Paul Weng
Robert Busa-fekete
Publication date: 1 January 2013
Publisher

Abstract

Abstract. Conventional reinforcement learning (RL) requires the specification of a numeric reward function, which is often a difficult task. In this paper, we extend the Q-learning approach toward the handling of ordinal rewards. The method we propose is interactive in the sense of allowing the agent to query a tutor for comparing sequences of ordinal rewards. More specifically, this method can be seen as an extension of a recently proposed interactive value iteration (IVI) algorithm for Markov Decision Processes to the setting of reinforcement learning; in contrast to the original IVI algorithm, our method is tolerant toward unreliable and inconsistent tutor feedback

Similar works

Full text

Open in the Core reader

Download PDF

Available Versions

SZTE Publicatio Repozitórium - SZTE - Repository of Publications

oai:publicatio.bibl.u-szeged.h...

Last time updated on 21/05/2016