Search CORE

140 research outputs found

A Unified Theory of Dual-Process Control

Author: Botvinick Matthew M.
Miller Kevin
Moskovitz Ted
Sahani Maneesh
Publication venue
Publication date: 10/10/2023
Field of study

Dual-process theories play a central role in both psychology and neuroscience, figuring prominently in fields ranging from executive control to reward-based learning to judgment and decision making. In each of these domains, two mechanisms appear to operate concurrently, one relatively high in computational complexity, the other relatively simple. Why is neural information processing organized in this way? We propose an answer to this question based on the notion of compression. The key insight is that dual-process structure can enhance adaptive behavior by allowing an agent to minimize the description length of its own behavior. We apply a single model based on this observation to findings from research on executive control, reward-based learning, and judgment and decision making, showing that seemingly diverse dual-process phenomena can be understood as domain-specific consequences of a single underlying set of computational principles

arXiv.org e-Print Archive

Minimum Description Length Control

Author: Botvinick Matthew M.
Kao Ta-Chu
Moskovitz Ted
Sahani Maneesh
Publication venue
Publication date: 24/07/2022
Field of study

We propose a novel framework for multitask reinforcement learning based on the minimum description length (MDL) principle. In this approach, which we term MDL-control (MDL-C), the agent learns the common structure among the tasks with which it is faced and then distills it into a simpler representation which facilitates faster convergence and generalization to new tasks. In doing so, MDL-C naturally balances adaptation to each task with epistemic uncertainty about the task distribution. We motivate MDL-C via formal connections between the MDL principle and Bayesian inference, derive theoretical performance guarantees, and demonstrate MDL-C's empirical effectiveness on both discrete and high-dimensional continuous control tasks

arXiv.org e-Print Archive

Subgoal- and goal-related reward prediction errors in medial prefrontal cortex

Author: Botvinick Matthew M.
Holroyd Clay
Ribas-Fernandes José J. F.
Shahnazian Danesh
Publication venue: 'MIT Press - Journals'
Publication date: 01/01/2019
Field of study

A longstanding view of the organization of human and animal behavior holds that behavior is hierarchically organizedin other words, directed toward achieving superordinate goals through the achievement of subordinate goals or subgoals. However, most research in neuroscience has focused on tasks without hierarchical structure. In past work, we have shown that negative reward prediction error (RPE) signals in medial prefrontal cortex (mPFC) can be linked not only to superordinate goals but also to subgoals. This suggests that mPFC tracks impediments in the progression toward subgoals. Using fMRI of human participants engaged in a hierarchical navigation task, here we found that mPFC also processes positive prediction errors at the level of subgoals, indicating that this brain region is sensitive to advances in subgoal completion. However, when subgoal RPEs were elicited alongside with goal-related RPEs, mPFC responses reflected only the goal-related RPEs. These findings suggest that information from different levels of hierarchy is processed selectively, depending on the task context

Ghent University Academic Bibliography

Adaptive patch foraging in deep reinforcement learning agents

Author: Botvinick Matthew M.
Butcher Andrew
Chapman Craig S.
Mathewson Kory W.
Pilarski Patrick M.
Wispinski Nathan J.
Publication venue
Publication date: 21/04/2023
Field of study

Patch foraging is one of the most heavily studied behavioral optimization challenges in biology. However, despite its importance to biological intelligence, this behavioral optimization problem is understudied in artificial intelligence research. Patch foraging is especially amenable to study given that it has a known optimal solution, which may be difficult to discover given current techniques in deep reinforcement learning. Here, we investigate deep reinforcement learning agents in an ecological patch foraging task. For the first time, we show that machine learning agents can learn to patch forage adaptively in patterns similar to biological foragers, and approach optimal patch foraging behavior when accounting for temporal discounting. Finally, we show emergent internal dynamics in these agents that resemble single-cell recordings from foraging non-human primates, which complements experimental and theoretical work on the neural mechanisms of biological foraging. This work suggests that agents interacting in complex environments with ecologically valid pressures arrive at common solutions, suggesting the emergence of foundational computations behind adaptive, intelligent behavior in both biological and artificial agents.Comment: Published in Transactions on Machine Learning Research (TMLR). See: https://openreview.net/pdf?id=a0T3nOP9s

arXiv.org e-Print Archive

Recommended from our members

Predictive representations can link model-based reinforcement learning to model-free mechanisms

Author: Botvinick Matthew M.
Daw Nathaniel D.
Gershman Samuel J.
Momennejad Ida
Russek Evan M.
Publication venue: 'Public Library of Science (PLoS)'
Publication date: 01/09/2017
Field of study

Humans and animals are capable of evaluating actions by considering their long-run future rewards through a process described using model-based reinforcement learning (RL) algorithms. The mechanisms by which neural circuits perform the computations prescribed by model-based RL remain largely unknown; however, multiple lines of evidence suggest that neural circuits supporting model-based behavior are structurally homologous to and overlapping with those thought to carry out model-free temporal difference (TD) learning. Here, we lay out a family of approaches by which model-based computation may be built upon a core of TD learning. The foundation of this framework is the successor representation, a predictive state representation that, when combined with TD learning of value predictions, can produce a subset of the behaviors associated with model-based learning, while requiring less decision-time computation than dynamic programming. Using simulations, we delineate the precise behavioral capabilities enabled by evaluating actions using this approach, and compare them to those demonstrated by biological organisms. We then introduce two new algorithms that build upon the successor representation while progressively mitigating its limitations. Because this framework can account for the full range of observed putatively model-based behaviors while still utilizing a core TD framework, we suggest that it represents a neurally plausible family of mechanisms for model-based evaluation

Princeton University Open Access Repository

Crossref

Harvard University - DASH

Directory of Open Access Journals

UCL Discovery