137 research outputs found
A Unified Theory of Dual-Process Control
Dual-process theories play a central role in both psychology and
neuroscience, figuring prominently in fields ranging from executive control to
reward-based learning to judgment and decision making. In each of these
domains, two mechanisms appear to operate concurrently, one relatively high in
computational complexity, the other relatively simple. Why is neural
information processing organized in this way? We propose an answer to this
question based on the notion of compression. The key insight is that
dual-process structure can enhance adaptive behavior by allowing an agent to
minimize the description length of its own behavior. We apply a single model
based on this observation to findings from research on executive control,
reward-based learning, and judgment and decision making, showing that seemingly
diverse dual-process phenomena can be understood as domain-specific
consequences of a single underlying set of computational principles
Minimum Description Length Control
We propose a novel framework for multitask reinforcement learning based on
the minimum description length (MDL) principle. In this approach, which we term
MDL-control (MDL-C), the agent learns the common structure among the tasks with
which it is faced and then distills it into a simpler representation which
facilitates faster convergence and generalization to new tasks. In doing so,
MDL-C naturally balances adaptation to each task with epistemic uncertainty
about the task distribution. We motivate MDL-C via formal connections between
the MDL principle and Bayesian inference, derive theoretical performance
guarantees, and demonstrate MDL-C's empirical effectiveness on both discrete
and high-dimensional continuous control tasks
Subgoal- and goal-related reward prediction errors in medial prefrontal cortex
A longstanding view of the organization of human and animal behavior holds that behavior is hierarchically organizedin other words, directed toward achieving superordinate goals through the achievement of subordinate goals or subgoals. However, most research in neuroscience has focused on tasks without hierarchical structure. In past work, we have shown that negative reward prediction error (RPE) signals in medial prefrontal cortex (mPFC) can be linked not only to superordinate goals but also to subgoals. This suggests that mPFC tracks impediments in the progression toward subgoals. Using fMRI of human participants engaged in a hierarchical navigation task, here we found that mPFC also processes positive prediction errors at the level of subgoals, indicating that this brain region is sensitive to advances in subgoal completion. However, when subgoal RPEs were elicited alongside with goal-related RPEs, mPFC responses reflected only the goal-related RPEs. These findings suggest that information from different levels of hierarchy is processed selectively, depending on the task context
Adaptive patch foraging in deep reinforcement learning agents
Patch foraging is one of the most heavily studied behavioral optimization
challenges in biology. However, despite its importance to biological
intelligence, this behavioral optimization problem is understudied in
artificial intelligence research. Patch foraging is especially amenable to
study given that it has a known optimal solution, which may be difficult to
discover given current techniques in deep reinforcement learning. Here, we
investigate deep reinforcement learning agents in an ecological patch foraging
task. For the first time, we show that machine learning agents can learn to
patch forage adaptively in patterns similar to biological foragers, and
approach optimal patch foraging behavior when accounting for temporal
discounting. Finally, we show emergent internal dynamics in these agents that
resemble single-cell recordings from foraging non-human primates, which
complements experimental and theoretical work on the neural mechanisms of
biological foraging. This work suggests that agents interacting in complex
environments with ecologically valid pressures arrive at common solutions,
suggesting the emergence of foundational computations behind adaptive,
intelligent behavior in both biological and artificial agents.Comment: Published in Transactions on Machine Learning Research (TMLR). See:
https://openreview.net/pdf?id=a0T3nOP9s
Recommended from our members
Predictive representations can link model-based reinforcement learning to model-free mechanisms
Humans and animals are capable of evaluating actions by considering their long-run future rewards through a process described using model-based reinforcement learning (RL) algorithms. The mechanisms by which neural circuits perform the computations prescribed by model-based RL remain largely unknown; however, multiple lines of evidence suggest that neural circuits supporting model-based behavior are structurally homologous to and overlapping with those thought to carry out model-free temporal difference (TD) learning. Here, we lay out a family of approaches by which model-based computation may be built upon a core of TD learning. The foundation of this framework is the successor representation, a predictive state representation that, when combined with TD learning of value predictions, can produce a subset of the behaviors associated with model-based learning, while requiring less decision-time computation than dynamic programming. Using simulations, we delineate the precise behavioral capabilities enabled by evaluating actions using this approach, and compare them to those demonstrated by biological organisms. We then introduce two new algorithms that build upon the successor representation while progressively mitigating its limitations. Because this framework can account for the full range of observed putatively model-based behaviors while still utilizing a core TD framework, we suggest that it represents a neurally plausible family of mechanisms for model-based evaluation
- …