236 research outputs found
Learning Representations in Model-Free Hierarchical Reinforcement Learning
Common approaches to Reinforcement Learning (RL) are seriously challenged by
large-scale applications involving huge state spaces and sparse delayed reward
feedback. Hierarchical Reinforcement Learning (HRL) methods attempt to address
this scalability issue by learning action selection policies at multiple levels
of temporal abstraction. Abstraction can be had by identifying a relatively
small set of states that are likely to be useful as subgoals, in concert with
the learning of corresponding skill policies to achieve those subgoals. Many
approaches to subgoal discovery in HRL depend on the analysis of a model of the
environment, but the need to learn such a model introduces its own problems of
scale. Once subgoals are identified, skills may be learned through intrinsic
motivation, introducing an internal reward signal marking subgoal attainment.
In this paper, we present a novel model-free method for subgoal discovery using
incremental unsupervised learning over a small memory of the most recent
experiences (trajectories) of the agent. When combined with an intrinsic
motivation learning mechanism, this method learns both subgoals and skills,
based on experiences in the environment. Thus, we offer an original approach to
HRL that does not require the acquisition of a model of the environment,
suitable for large-scale applications. We demonstrate the efficiency of our
method on two RL problems with sparse delayed feedback: a variant of the rooms
environment and the first screen of the ATARI 2600 Montezuma's Revenge game
Hierarchical Reinforcement Learning in Behavior and the Brain
Dissertation presented to obtain the Ph.D degree in Biology, NeuroscienceReinforcement learning (RL) has provided key insights to the neurobiology
of learning and decision making. The pivotal nding is that the
phasic activity of dopaminergic cells in the ventral tegmental area during
learning conforms to a reward prediction error (RPE), as speci ed in the
temporal-di erence learning algorithm (TD). This has provided insights to
conditioning, the distinction between habitual and goal-directed behavior,
working memory, cognitive control and error monitoring. It has also advanced
the understanding of cognitive de cits in Parkinson's disease, depression,
ADHD and of personality traits such as impulsivity.(...
Spatial subgoal learning in the mouse: behavioral and computational mechanisms
Here we aim to better understand how animals navigate structured environments. The prevailing wisdom is that they can select among two distinct approaches: querying a mental map of the environment or repeating previously successful trajectories to a goal. However, this dichotomy has been built around data from rodents trained to solve mazes, and it is unclear how it applies to more naturalistic scenarios such as self-motivated navigation in open environments with obstacles. In this project, we leveraged instinctive escape behavior in mice to investigate how rodents use a period of exploration to learn about goals and obstacles in an unfamiliar environment. In our most basic assay, mice explore an environment with a shelter and an obstacle for 5-20 minutes and then we present threat stimuli to trigger escapes to shelter. After 5-10 minutes of exploration, mice took inefficient paths to the shelter, often nearly running into the obstacle and then relying on visual and tactile cues to avoid it. Within twenty minutes, however, they spontaneously developed an efficient subgoal strategy, escaping directly to the obstacle edge before heading to the shelter. Mice escaped in this manner even if the obstacle was removed, suggesting that they had memorized a mental map of subgoals. Unlike typical models of map-based planning, however, we found that investigating the obstacle was not important for updating the map. Instead, learning resembled trajectory repetition: mice had to execute `practice runs' toward an obstacle edge in order to memorize subgoals. To test this hypothesis directly, we developed a closed-loop neural manipulation, interrupting spontaneous practice runs by stimulating premotor cortex. This manipulation successfully prevented subgoal learning, whereas several control manipulations did not. We modelled these results using a panel of reinforcement learning approaches and found that mice behavior is best matched by systems that explore in a non-uniform manner and possess a high-level spatial representation of regions in the arena. We conclude that mice use practice runs to learn useful subgoals and integrate them into a hierarchical cognitive map of their surroundings. These results broaden our understanding of the cognitive toolkit that mammals use to acquire spatial knowledge
Mice identify subgoal locations through an action-driven mapping process
Mammals form mental maps of the environments by exploring their surroundings. Here, we investigate which elements of exploration are important for this process. We studied mouse escape behavior, in which mice are known to memorize subgoal locations-obstacle edges-to execute efficient escape routes to shelter. To test the role of exploratory actions, we developed closed-loop neural-stimulation protocols for interrupting various actions while mice explored. We found that blocking running movements directed at obstacle edges prevented subgoal learning; however, blocking several control movements had no effect. Reinforcement learning simulations and analysis of spatial data show that artificial agents can match these results if they have a region-level spatial representation and explore with object-directed movements. We conclude that mice employ an action-driven process for integrating subgoals into a hierarchical cognitive map. These findings broaden our understanding of the cognitive toolkit that mammals use to acquire spatial knowledge
Multiscale Markov Decision Problems: Compression, Solution, and Transfer Learning
Many problems in sequential decision making and stochastic control often have
natural multiscale structure: sub-tasks are assembled together to accomplish
complex goals. Systematically inferring and leveraging hierarchical structure,
particularly beyond a single level of abstraction, has remained a longstanding
challenge. We describe a fast multiscale procedure for repeatedly compressing,
or homogenizing, Markov decision processes (MDPs), wherein a hierarchy of
sub-problems at different scales is automatically determined. Coarsened MDPs
are themselves independent, deterministic MDPs, and may be solved using
existing algorithms. The multiscale representation delivered by this procedure
decouples sub-tasks from each other and can lead to substantial improvements in
convergence rates both locally within sub-problems and globally across
sub-problems, yielding significant computational savings. A second fundamental
aspect of this work is that these multiscale decompositions yield new transfer
opportunities across different problems, where solutions of sub-tasks at
different levels of the hierarchy may be amenable to transfer to new problems.
Localized transfer of policies and potential operators at arbitrary scales is
emphasized. Finally, we demonstrate compression and transfer in a collection of
illustrative domains, including examples involving discrete and continuous
statespaces.Comment: 86 pages, 15 figure
- …