1,131 research outputs found
Reinstated episodic context guides sampling-based decisions for reward.
How does experience inform decisions? In episodic sampling, decisions are guided by a few episodic memories of past choices. This process can yield choice patterns similar to model-free reinforcement learning; however, samples can vary from trial to trial, causing decisions to vary. Here we show that context retrieved during episodic sampling can cause choice behavior to deviate sharply from the predictions of reinforcement learning. Specifically, we show that, when a given memory is sampled, choices (in the present) are influenced by the properties of other decisions made in the same context as the sampled event. This effect is mediated by fMRI measures of context retrieval on each trial, suggesting a mechanism whereby cues trigger retrieval of context, which then triggers retrieval of other decisions from that context. This result establishes a new avenue by which experience can guide choice and, as such, has broad implications for the study of decisions
Improving Deep Reinforcement Learning Using Graph Convolution and Visual Domain Transfer
Recent developments in Deep Reinforcement Learning (DRL) have shown tremendous progress in robotics control, Atari games, board games such as Go, etc. However, model free DRL still has limited use cases due to its poor sampling efficiency and generalization on a variety of tasks. In this thesis, two particular drawbacks of DRL are investigated: 1) the poor generalization abilities of model free DRL. More specifically, how to generalize an agent\u27s policy to unseen environments and generalize to task performance on different data representations (e.g. image based or graph based) 2) The reality gap issue in DRL. That is, how to effectively transfer a policy learned in a simulator to the real world. This thesis makes several novel contributions to the field of DRL which are outlined sequentially in the following. Among these contributions is the generalized value iteration network (GVIN) algorithm, which is an end-to-end neural network planning module extending the work of Value Iteration Networks (VIN). GVIN emulates the value iteration algorithm by using a novel graph convolution operator, which enables GVIN to learn and plan on irregular spatial graphs. Additionally, this thesis proposes three novel, differentiable kernels as graph convolution operators and shows that the embedding-based kernel achieves the best performance. Furthermore, an improvement upon traditional -step -learning that stabilizes training for VIN and GVIN is demonstrated. Additionally, the equivalence between GVIN and graph neural networks is outlined and shown that GVIN can be further extended to address both control and inference problems. The final subject which falls under the graph domain that is studied in this thesis is graph embeddings. Specifically, this work studies a general graph embedding framework GEM-F that unifies most of the previous graph embedding algorithms. Based on the contributions made during the analysis of GEM-F, a novel algorithm called WarpMap which outperforms DeepWalk and node2vec in the unsupervised learning settings is proposed. The aforementioned reality gap in DRL prohibits a significant portion of research from reaching the real world setting. The latter part of this work studies and analyzes domain transfer techniques in an effort to bridge this gap. Typically, domain transfer in RL consists of representation transfer and policy transfer. In this work, the focus is on representation transfer for vision based applications. More specifically, aligning the feature representation from source domain to target domain in an unsupervised fashion. In this approach, a linear mapping function is considered to fuse modules that are trained in different domains. Proposed are two improved adversarial learning methods to enhance the training quality of the mapping function. Finally, the thesis demonstrates the effectiveness of domain alignment among different weather conditions in the CARLA autonomous driving simulator
Recommended from our members
Brain network mechanisms in learning behavior
The study of learning has been a central focus of psychology and neuroscience since their inception. Cognitive neuroscience’s traditional approach to understanding learn-ing has been to decompose it into discrete cognitive processes with separable and localized underlying neural systems. While this focus on modular cognitive functions for individual brain areas has led to considerable progress, there is increasing evidence that much of learn-ing behavior relies on overlapping cognitive and neural systems, which may be harder to disentangle than previously envisioned. This is not surprising, as the processes underlying learning must involve widespread integration of information from sensory, affective, and motor sources. The standard tools of cognitive neuroscience limit our ability to describe processes that rely on widespread coordination of brain activity. To understand learning, it will be necessary to characterize dynamic co-activation at the circuit level.
In this dissertation, I present three studies that seek to describe the roles of distrib-uted brain networks in learning. I begin by giving an overview of our current understand-ing of multiple forms of learning, describing the neural and computational mechanisms thought to underlie incremental feedback-based learning and flexible episodic memory. I will focus in particular on the difficulties in separating these processes at the cognitive level and in localizing them to individual regions at the neural level. I will then describe recent findings that have begun to characterize the brain’s large-scale network structure, emphasiz-ing the potential roles that distributed networks could play in understanding learning and cognition more generally. I will end the introduction by reviewing current attempts to char-acterize the dynamics of large-scale brain networks, which will be essential for providing a mechanistic link to learning behavior.
Chapter 2 is a study demonstrating that intrinsic connectivity between the hippo-campus and the ventromedial prefrontal cortex, as well as between these regions and dis-tributed brain networks, is related to individual differences in the transfer of learning on a sensory preconditioning task. The hippocampus and ventromedial prefrontal cortex have both been shown to be involved in this type of learning, and this study represents an early attempt to link connectivity between individual regions and broader networks to learning processes.
Chapter 3 is a study that takes advantage of recent developments in mathematical modeling of temporal networks to demonstrate a relationship between large-scale network dynamics and reinforcement learning within individuals. This study shows that the flexibil-ity of network connectivity in the striatum is related to learning performance over time, as well as to individual differences in parameters estimated from computational models of re-inforcement learning. Notably, connectivity between the striatum and visual as well as or-bitofrontal regions increased over the course of the task, which is consistent with an inte-grative role for the region in learning value-based associations. Network flexibility in a dis-tinct set of regions is associated with episodic memory for object images presented during the learning task.
Chapter 4 examines the role of dopamine, a neurotransmitter strongly linked to val-ue updating in reinforcement learning, in the dynamic network changes occurring during learning. Patients with Parkinson’s disease, who experience a loss of dopaminergic neu-rons in the substantia nigra, performed a reversal-learning task while undergoing functional magnetic resonance imaging. Patients were scanned on and off of a dopamine precursor medication (levodopa) in a within-subject design in order to examine the impact of dopa-mine on brain network dynamics during learning. The reversal provided an experimental manipulation of dynamic connectivity, and patients on medication showed greater modula-tion of striatal-cortical connectivity. Similar results were found in a number of regions re-ceiving midbrain projections including the prefrontal cortex and medial temporal lobe. This study indicates that dopamine inputs from the midbrain modulate large-scale network dy-namics during learning, providing a direct link between reinforcement learning theories of value updating and network neuroscience accounts of dynamic connectivity.
Together, these results indicate that large-scale networks play a critical role in multi-ple forms of learning behavior. Each highlights the potential importance of understanding dynamic routing and integration of information across large-scale circuits for our concep-tion of learning and other cognitive processes. Understanding the when, where, and how of this information flow in the brain may provide an alternative or compliment to traditional theories of distinct learning systems. These studies also illustrate challenges in integrating this perspective with established theories in cognitive neuroscience. Chapter 5 will situate the studies in a broader discussion of how brain activity relates to cognition in general, while pointing out current roadblocks and potential ways forward for a cognitive network neuroscience of learning
Model-Based Reinforcement Learning with Multinomial Logistic Function Approximation
We study model-based reinforcement learning (RL) for episodic Markov decision
processes (MDP) whose transition probability is parametrized by an unknown
transition core with features of state and action. Despite much recent progress
in analyzing algorithms in the linear MDP setting, the understanding of more
general transition models is very restrictive. In this paper, we establish a
provably efficient RL algorithm for the MDP whose state transition is given by
a multinomial logistic model. To balance the exploration-exploitation
trade-off, we propose an upper confidence bound-based algorithm. We show that
our proposed algorithm achieves regret
bound where is the dimension of the transition core, is the horizon,
and is the total number of steps. To the best of our knowledge, this is the
first model-based RL algorithm with multinomial logistic function approximation
with provable guarantees. We also comprehensively evaluate our proposed
algorithm numerically and show that it consistently outperforms the existing
methods, hence achieving both provable efficiency and practical superior
performance.Comment: Accepted in AAAI 2023 (Main Technical Track
- …