Deep Reinforcement Learning and sub-problem decomposition using Hierarchical Architectures in partially observable environments

Abstract

Reinforcement Learning (RL) is based on the Markov Decision Process (MDP) framework, but not all the problems of interest can be modeled with MDPs because some of them have non-markovian temporal dependencies. To handle them, one of the solutions proposed in literature is Hierarchical Reinforcement Learning (HRL). HRL takes inspiration from hierarchical planning in artificial intelligence literature and it is an emerging sub-discipline for RL, in which RL methods are augmented with some kind of prior knowledge about the high-level structure of behavior in order to decompose the underlying problem into simpler sub-problems. The high-level goal of our thesis is to investigate the advantages that a HRL approach may have over a simple RL approach. Thus, we study problems of interest (rarely tackled by mean of RL) like Sentiment Analysis, Rogue and Car Controller, showing how the ability of RL algorithms to solve them in a partially observable environment is affected by using (or not) generic hierarchical architectures based on RL algorithms of the Actor-Critic family. Remarkably, we claim that especially our work in Sentiment Analysis is very innovative for RL, resulting in state-of-the-art performances; as far as the author knows, Reinforcement Learning approach is only rarely applied to the domain of computational linguistic and sentiment analysis. Furthermore, our work on the famous video-game Rogue is probably the first example of Deep RL architecture able to explore Rogue dungeons and fight against its monsters achieving a success rate of more than 75% on the first game level. While our work on Car Controller allowed us to make some interesting considerations on the nature of some components of the policy gradient equation

    Similar works