Interactive Q-Learning with Ordinal Rewards and Unreliable Tutor

Abstract

Abstract. Conventional reinforcement learning (RL) requires the specification of a numeric reward function, which is often a difficult task. In this paper, we extend the Q-learning approach toward the handling of ordinal rewards. The method we propose is interactive in the sense of allowing the agent to query a tutor for comparing sequences of ordinal rewards. More specifically, this method can be seen as an extension of a recently proposed interactive value iteration (IVI) algorithm for Markov Decision Processes to the setting of reinforcement learning; in contrast to the original IVI algorithm, our method is tolerant toward unreliable and inconsistent tutor feedback

    Similar works