Reinforcement Learning through Supervision for Autonomous Agents

Abstract

Abstract Reinforcement Learning (RL) is a class of model-free learning control methods that can solve Markov Decision Process (MDP) problems. However, one difficulty for the application of RL control is its slow convergence, especially in MDPs with continuous state space. In this paper, a modified structure of RL is proposed to accelerate reinforcement learning control. This approach combines supervision technique with the standard Qlearning algorithm of reinforcement learning. The a priori information is provided to the RL learning agent by a direct integration of a human operator commands (a.k.a. human advices) or by an optimal LQ-controller, indicating preferred actions in some particular situations. It is shown that the convergence speed of the supervised RL agent is greatly improved compared to the conventional Q-Learning algorithm. Simulation work and results on the cart-pole balancing problem and learning navigation tasks in unknown grid world with obstacles are given to illustrate the efficiency of the proposed method

    Similar works