Search CORE

87,974 research outputs found

Learning Representations in Model-Free Hierarchical Reinforcement Learning

Author: Noelle David C.
Rafati Jacob
Publication venue
Publication date: 12/04/2019
Field of study

Common approaches to Reinforcement Learning (RL) are seriously challenged by large-scale applications involving huge state spaces and sparse delayed reward feedback. Hierarchical Reinforcement Learning (HRL) methods attempt to address this scalability issue by learning action selection policies at multiple levels of temporal abstraction. Abstraction can be had by identifying a relatively small set of states that are likely to be useful as subgoals, in concert with the learning of corresponding skill policies to achieve those subgoals. Many approaches to subgoal discovery in HRL depend on the analysis of a model of the environment, but the need to learn such a model introduces its own problems of scale. Once subgoals are identified, skills may be learned through intrinsic motivation, introducing an internal reward signal marking subgoal attainment. In this paper, we present a novel model-free method for subgoal discovery using incremental unsupervised learning over a small memory of the most recent experiences (trajectories) of the agent. When combined with an intrinsic motivation learning mechanism, this method learns both subgoals and skills, based on experiences in the environment. Thus, we offer an original approach to HRL that does not require the acquisition of a model of the environment, suitable for large-scale applications. We demonstrate the efficiency of our method on two RL problems with sparse delayed feedback: a variant of the rooms environment and the first screen of the ATARI 2600 Montezuma's Revenge game

arXiv.org e-Print Archive

Association for the Advancement of Artificial Intelligence: AAAI Publications

Finding minimal action sequences with a simple evaluation of actions

Author: Ashby
Ashby
Ashvin Shah
Balleine
Balleine
Barto
Barto
Barto
Barto
Barto
Berridge
Berridge
Berridge
Bertsekas
Bi
Bissmarck
Bolado-Gomez
Chen
Chersi
Chung
Curcio
Curtis
Daw
Dickinson
Dietterich
Fagg
Friston
GlÃ¤scher
Goldman-Rakic
Graziano
Green
Gurney
Hart
Haruno
Horvitz
Houk
Izhikevich
Kawato
Kevin N. Gurney
Klopf
Knox
Koch
Konidaris
Konidaris
Kurth-Nelson
Kurtzer
Lillicrap
Ljungberg
Logan
London
Mahadevan
Markram
Mel
Milner
Moser
Myerson
Myerson
Niv
Osentoski
Oudeyer
Packard
Pan
Pasupathy
Pavlov
Pearce
Pedotti
Ravindran
Redgrave
Redgrave
Redgrave
Redgrave
Redgrave
Rosenstein
Rummery
Samejima
Samuelson
Schmidhuber
Schultz
Schultz
Schultz
Scott
Shah
Shah
Shah
Shah
Shah
Shah
Skinner
Staddon
Stafford
Strotz
Suri
Sutton
Sutton
Sutton
Sutton
Sutton
Thaler
Thorndike
Todorov
van Essen
Vasilaki
Wassum
Wickens
Willis
WÃ¶rgÃ¶tter
Yin
Publication venue: 'Frontiers Media SA'
Publication date: 01/01/2014
Field of study

Animals are able to discover the minimal number of actions that achieves an outcome (the minimal action sequence). In most accounts of this, actions are associated with a measure of behavior that is higher for actions that lead to the outcome with a shorter action sequence, and learning mechanisms find the actions associated with the highest measure. In this sense, previous accounts focus on more than the simple binary signal of “was the outcome achieved?”; they focus on “how well was the outcome achieved?” However, such mechanisms may not govern all types of behavioral development. In particular, in the process of action discovery (Redgrave and Gurney, 2006), actions are reinforced if they simply lead to a salient outcome because biological reinforcement signals occur too quickly to evaluate the consequences of an action beyond an indication of the outcome’s occurrence. Thus, action discovery mechanisms focus on the simple evaluation of “was the outcome achieved?” and not “how well was the outcome achieved?” Notwithstanding this impoverishment of information, can the process of action discovery find the minimal action sequence? We address this question by implementing computational mechanisms, referred to in this paper as no-cost learning rules, in which each action that leads to the outcome is associated with the same measure of behavior. No-cost rules focus on “was the outcome achieved?” and are consistent with action discovery. No-cost rules discover the minimal action sequence in simulated tasks and execute it for a substantial amount of time. Extensive training, however, results in extraneous actions, suggesting that a separate process (which has been proposed in action discovery) must attenuate learning if no-cost rules participate in behavioral development. We describe how no-cost rules develop behavior, what happens when attenuation is disrupted, and relate the new mechanisms to wider computational and biological context

Crossref

Directory of Open Access Journals

Frontiers - Publisher Connector

PubMed Central

White Rose Research Online

Diverse Offline Imitation via Fenchel Duality

Author: Cheng Jin
Kolev Pavel
Martius Georg
Vlastelica Marin
Publication venue
Publication date: 21/07/2023
Field of study

There has been significant recent progress in the area of unsupervised skill discovery, with various works proposing mutual information based objectives, as a source of intrinsic motivation. Prior works predominantly focused on designing algorithms that require online access to the environment. In contrast, we develop an \textit{offline} skill discovery algorithm. Our problem formulation considers the maximization of a mutual information objective constrained by a KL-divergence. More precisely, the constraints ensure that the state occupancy of each skill remains close to the state occupancy of an expert, within the support of an offline dataset with good state-action coverage. Our main contribution is to connect Fenchel duality, reinforcement learning and unsupervised skill discovery, and to give a simple offline algorithm for learning diverse skills that are aligned with an expert

arXiv.org e-Print Archive

Reinforcement learning in dendritic structures

Author: Mathieu Schiess
ME Larkum
P Poirazi
Robert Urbanczik
Walter Senn
Publication venue: BioMed Central
Publication date: 01/01/2010
Field of study

The discovery of binary dendritic events such as local NMDA spikes in dendritic subbranches led to the suggestion that dendritic trees could be computationally equivalent to a 2-layer network of point neurons, with a single output unit represented by the soma, and input units represented by the dendritic branches. Although this interpretation endows a neuron with a high computational power, it is functionally not clear why nature would have preferred the dendritic solution with a single but complex neuron, as opposed to the network solution with many but simple units. We show that the dendritic solution has a distinguished advantage over the network solution when considering different learning tasks. Its key property is that the dendritic branches receive an immediate feedback from the somatic output spike, while in the corresponding network architecture the feedback would require additional backpropagating connections to the input units. Assuming a reinforcement learning scenario we formally derive a learning rule for the synaptic contacts on the individual dendritic trees which depends on the presynaptic activity, the local NMDA spikes, the somatic action potential, and a delayed reinforcement signal. We test the model for two scenarios: the learning of binary classifications and of precise spike timings. We show that the immediate feedback represented by the backpropagating action potential supplies the individual dendritic branches with enough information to efficiently adapt their synapses and to speed up the learning process

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

Bern Open Repository and Information System (BORIS)