4 research outputs found

    Reinforcement learning with supervision beyond environmental rewards

    Get PDF
    Reinforcement Learning (RL) is an elegant approach to tackle sequential decision-making problems. In the standard setting, the task designer curates a reward function and the RL agent's objective is to take actions in the environment such that the long-term cumulative reward is maximized. Deep RL algorithms---that combine RL principles with deep neural networks---have been successfully used to learn behaviors in complex environments but are generally quite sensitive to the nature of the reward function. For a given RL problem, the environmental rewards could be sparse, delayed, misspecified, or unavailable (i.e., impossible to define mathematically for the required behavior). These scenarios exacerbate the challenge of training a stable deep-RL agent in a sample-efficient manner. In this thesis, we study methods that go beyond a direct reliance on the environmental rewards by generating additional information signals that the RL agent could incorporate for learning the desired skills. We start by investigating the performance bottlenecks in delayed reward environments and propose to address these by learning surrogate rewards. We include two methods to compute the surrogate rewards using the agent-environment interaction data. Then, we consider the imitation-learning (IL) setting where we don't have access to any rewards, but instead, are provided with a dataset of expert demonstrations that the RL agent must learn to reliably reproduce. We propose IL algorithms for partially observable environments and situations with discrepancies between the transition dynamics of the expert and the imitator. Next, we consider the benefits of learning an ensemble of RL agents with explicit diversity pressure. We show that diversity encourages exploration and facilitates the discovery of sparse environmental rewards. Finally, we analyze the concept of sharing knowledge between RL agents operating in different but related environments and show that the information transfer can accelerate learning
    corecore