3,067 research outputs found

    Using Relative Novelty to Identify Useful Temporal Abstractions in Reinforcement Learning

    Get PDF
    We present a new method for automatically creating useful temporal abstractions in reinforcement learning. We argue that states that allow the agent to transition to a different region of the state space are useful subgoals, and propose a method for identifying them using the concept of relative novelty. When such a state is identified, a temporallyextended activity (e.g., an option) is generated that takes the agent efficiently to this state. We illustrate the utility of the method in a number of tasks

    Subgoal Identifications in Reinforcement Learning: A Survey

    Get PDF

    Learning the Structure of Continuous Markov Decision Processes

    Get PDF
    There is growing interest in artificial, intelligent agents which can operate autonomously for an extended period of time in complex environments and fulfill a variety of different tasks. Such agents will face different problems during their lifetime which may not be foreseeable at the time of their deployment. Thus, the capacity for lifelong learning of new behaviors is an essential prerequisite for this kind of agents as it enables them to deal with unforeseen situations. However, learning every complex behavior anew from scratch would be cumbersome for the agent. It is more plausible to consider behavior to be modular and let the agent acquire a set of reusable building blocks for behavior, the so-called skills. These skills might, once acquired, facilitate fast learning and adaptation of behavior to new situations. This work focuses on computational approaches for skill acquisition, namely which kind of skills shall be acquired and how to acquire them. The former is commonly denoted as skill discovery and the latter as skill learning . The main contribution of this thesis is a novel incremental skill acquisition approach which is suited for lifelong learning. In this approach, the agent learns incrementally a graph-based representation of a domain and exploits certain properties of this graph such as its bottlenecks for skill discovery. This thesis proposes a novel approach for learning a graph-based representation of continuous domains based on formalizing the problem as a probabilistic generative model. Furthermore, a new incremental agglomerative clustering approach for identifying bottlenecks of such graphs is presented. Thereupon, the thesis proposes a novel intrinsic motivation system which enables an agent to intelligently allocate time between skill discovery and skill learning in developmental settings, where the agent is not constrained by external tasks. The results of this thesis show that the resulting skill acquisition approach is suited for continuous domains and can deal with domain stochasticity and different explorative behavior of the agent. The acquired skills are reusable and versatile and can be used in multi-task and lifelong learning settings in high-dimensional problems

    Hierarchical reinforcement learning: learning sub-goals and state-abstraction

    Get PDF
    Os seres humanos possuem a incrível capacidade de criar e utilizar abstracções. Com essas abstracções somos capazes de resolver tarefas extremamente complexas que requerem muita antevisão e planeamento. A pesquisa efectuada em Hierarchical Reinforcement Learning demonstrou a utilidade das abstracções, mas também introduziu um novo problema. Como encontrar uma maneira de descobrir de forma autónoma abstracções úteis e criá-las enquanto aprende? Neste trabalho, apresentamos um novo método que permite a um agente descobrir e criar abstracções temporais de forma autónoma. Essas abstracções são baseadas na framework das Options. O nosso método é baseado no conceito de que para alcançar o objectivo, o agente deve passar por determinados estados. Ao longo do tempo estes estados vão começar a diferenciar-se dos restantes, e serão identificados como sub-objectivos úteis. Poderão ser utilizados pelo agente para criar novas abstracções temporais, cujo objectivo é ajudar a atingir esses objectivos secundários. Para detectar subobjectivos, o nosso método cria intersecções entre os vários caminhos que levam ao objectivo principal. Para que uma tarefa seja resolvida com sucesso, o agente deve passar por certas regiões do espaço de estados, estas regiões correspondem à nossa definição de sub-objectivos. A nossa investigação focou-se no problema da navegação em salas, e também no problema do táxi. Concluímos que um agente pode aprender mais rapidamente em problemas mais complexos, ao automaticamente descobrir sub-objectivos e criar abstracções sem precisar de um programador para fornecer informações adicionais e de criar as abstracções manualmente.Human beings have the incredible capability of creating and using abstractions. With these abstractions we are able to solve extremely complex tasks that require a lot of foresight and planning. Research in Hierarchical Reinforcement Learning has demonstrated the utility of abstractions, but, it also has introduced a new problem. How can we find a way to autonomously discover and create useful abstractions while learning? In this dissertation we present a new method that allows an agent to discover and create temporal abstractions autonomously based in the options framework. Our method is based on the concept that to reach the goal, the agent must pass through certain states. Throughout time these states will begin to differentiate from others, and will be detected as useful subgoals and be used by the agent to create new temporal abstractions, whose objective is to help achieve these subgoals. To detect useful subgoals, our method creates intersections between several paths leading to a goal. In order for a task to be solved successfully the agent must pass through certain regions of the state space, these regions will correspond to our definition of subgoals. Our research focused on domains largely used in the study of the utility of temporal abstractions, which is the room-to-room navigation problem, and also the taxi problem. We determined that, in the problems tested, an agent can learn more rapidly in more complex problems by automatically discovering subgoals and creating abstractions without needing a programmer to provide additional information and handcraft the abstractions
    corecore