236 research outputs found

    Learning Representations in Model-Free Hierarchical Reinforcement Learning

    Full text link
    Common approaches to Reinforcement Learning (RL) are seriously challenged by large-scale applications involving huge state spaces and sparse delayed reward feedback. Hierarchical Reinforcement Learning (HRL) methods attempt to address this scalability issue by learning action selection policies at multiple levels of temporal abstraction. Abstraction can be had by identifying a relatively small set of states that are likely to be useful as subgoals, in concert with the learning of corresponding skill policies to achieve those subgoals. Many approaches to subgoal discovery in HRL depend on the analysis of a model of the environment, but the need to learn such a model introduces its own problems of scale. Once subgoals are identified, skills may be learned through intrinsic motivation, introducing an internal reward signal marking subgoal attainment. In this paper, we present a novel model-free method for subgoal discovery using incremental unsupervised learning over a small memory of the most recent experiences (trajectories) of the agent. When combined with an intrinsic motivation learning mechanism, this method learns both subgoals and skills, based on experiences in the environment. Thus, we offer an original approach to HRL that does not require the acquisition of a model of the environment, suitable for large-scale applications. We demonstrate the efficiency of our method on two RL problems with sparse delayed feedback: a variant of the rooms environment and the first screen of the ATARI 2600 Montezuma's Revenge game

    Hierarchical Reinforcement Learning in Behavior and the Brain

    Get PDF
    Dissertation presented to obtain the Ph.D degree in Biology, NeuroscienceReinforcement learning (RL) has provided key insights to the neurobiology of learning and decision making. The pivotal nding is that the phasic activity of dopaminergic cells in the ventral tegmental area during learning conforms to a reward prediction error (RPE), as speci ed in the temporal-di erence learning algorithm (TD). This has provided insights to conditioning, the distinction between habitual and goal-directed behavior, working memory, cognitive control and error monitoring. It has also advanced the understanding of cognitive de cits in Parkinson's disease, depression, ADHD and of personality traits such as impulsivity.(...

    Spatial subgoal learning in the mouse: behavioral and computational mechanisms

    Get PDF
    Here we aim to better understand how animals navigate structured environments. The prevailing wisdom is that they can select among two distinct approaches: querying a mental map of the environment or repeating previously successful trajectories to a goal. However, this dichotomy has been built around data from rodents trained to solve mazes, and it is unclear how it applies to more naturalistic scenarios such as self-motivated navigation in open environments with obstacles. In this project, we leveraged instinctive escape behavior in mice to investigate how rodents use a period of exploration to learn about goals and obstacles in an unfamiliar environment. In our most basic assay, mice explore an environment with a shelter and an obstacle for 5-20 minutes and then we present threat stimuli to trigger escapes to shelter. After 5-10 minutes of exploration, mice took inefficient paths to the shelter, often nearly running into the obstacle and then relying on visual and tactile cues to avoid it. Within twenty minutes, however, they spontaneously developed an efficient subgoal strategy, escaping directly to the obstacle edge before heading to the shelter. Mice escaped in this manner even if the obstacle was removed, suggesting that they had memorized a mental map of subgoals. Unlike typical models of map-based planning, however, we found that investigating the obstacle was not important for updating the map. Instead, learning resembled trajectory repetition: mice had to execute `practice runs' toward an obstacle edge in order to memorize subgoals. To test this hypothesis directly, we developed a closed-loop neural manipulation, interrupting spontaneous practice runs by stimulating premotor cortex. This manipulation successfully prevented subgoal learning, whereas several control manipulations did not. We modelled these results using a panel of reinforcement learning approaches and found that mice behavior is best matched by systems that explore in a non-uniform manner and possess a high-level spatial representation of regions in the arena. We conclude that mice use practice runs to learn useful subgoals and integrate them into a hierarchical cognitive map of their surroundings. These results broaden our understanding of the cognitive toolkit that mammals use to acquire spatial knowledge

    Mice identify subgoal locations through an action-driven mapping process

    Get PDF
    Mammals form mental maps of the environments by exploring their surroundings. Here, we investigate which elements of exploration are important for this process. We studied mouse escape behavior, in which mice are known to memorize subgoal locations-obstacle edges-to execute efficient escape routes to shelter. To test the role of exploratory actions, we developed closed-loop neural-stimulation protocols for interrupting various actions while mice explored. We found that blocking running movements directed at obstacle edges prevented subgoal learning; however, blocking several control movements had no effect. Reinforcement learning simulations and analysis of spatial data show that artificial agents can match these results if they have a region-level spatial representation and explore with object-directed movements. We conclude that mice employ an action-driven process for integrating subgoals into a hierarchical cognitive map. These findings broaden our understanding of the cognitive toolkit that mammals use to acquire spatial knowledge

    Multiscale Markov Decision Problems: Compression, Solution, and Transfer Learning

    Full text link
    Many problems in sequential decision making and stochastic control often have natural multiscale structure: sub-tasks are assembled together to accomplish complex goals. Systematically inferring and leveraging hierarchical structure, particularly beyond a single level of abstraction, has remained a longstanding challenge. We describe a fast multiscale procedure for repeatedly compressing, or homogenizing, Markov decision processes (MDPs), wherein a hierarchy of sub-problems at different scales is automatically determined. Coarsened MDPs are themselves independent, deterministic MDPs, and may be solved using existing algorithms. The multiscale representation delivered by this procedure decouples sub-tasks from each other and can lead to substantial improvements in convergence rates both locally within sub-problems and globally across sub-problems, yielding significant computational savings. A second fundamental aspect of this work is that these multiscale decompositions yield new transfer opportunities across different problems, where solutions of sub-tasks at different levels of the hierarchy may be amenable to transfer to new problems. Localized transfer of policies and potential operators at arbitrary scales is emphasized. Finally, we demonstrate compression and transfer in a collection of illustrative domains, including examples involving discrete and continuous statespaces.Comment: 86 pages, 15 figure
    corecore