Search CORE

309 research outputs found

Canonical Cortical Circuits and the Duality of Bayesian Inference and Optimal Control

Author: Doya Kenji
Publication venue
Publication date: 03/07/2021
Field of study

The duality of sensory inference and motor control has been known since the 1960s and has recently been recognized as the commonality in computations required for the posterior distributions in Bayesian inference and the value functions in optimal control. Meanwhile, an intriguing question about the brain is why the entire neocortex shares a canonical six-layer architecture while its posterior and anterior halves are engaged in sensory processing and motor control, respectively. Here we consider the hypothesis that the sensory and motor cortical circuits implement the dual computations for Bayesian inference and optimal control, or perceptual and value-based decision making, respectively. We first review the classic duality of inference and control in linear quadratic systems and then review the correspondence between dynamic Bayesian inference and optimal control. Based on the architecture of the canonical cortical circuit, we explore how different cortical neurons may represent variables and implement computations.Comment: 13 pages, 3 figur

arXiv.org e-Print Archive

OIST Institutional Repository

Institutional Repositories DataBase (IRDB)

Chunking Patterns Reflect Effector-dependent Representation of Motor Sequence

Author: Bapi Raju
Doya Kenji
Miyapuram Krishna
Publication venue
Publication date: 01/01/2006
Field of study

Sequential organization is central to much of human intelligent behavior ranging from everyday skills such as lacing shoes to using a computer. It is well known that such sequential skills involve chaining a number of primitive actions together. A robust representation of skills can be formed by chunking together several elements of a sequence. We demonstrate, using a 2x6 finger movement task, that during the process of acquiring visuomotor skills the chunking patterns remained unaltered when utilizing an effector dependent representation of the sequence. In the 2x6 task, subjects learned a sequence of 12 visual cues displayed as six sets of two elements each and performed finger movements on a keypad. Two experiments Normal-Motor and Normal-Visual were conducted on nine subjects and two observations were collected from each subject. Each experiment consisted of a Normal and a Rotated condition. In the Rotated (Motor and Visual) conditions, subjects were required to rotate the visual cues by 180 degrees and press the corresponding keys. The display sequence was also rotated for the Motor condition, requiring an identical set of effector movements to be performed as in the Normal condition. Chunking patterns were identified using the response times (RTs) for individual sets of the sequence. A pause between set RTs demarcates an ensuing chunk. We demonstrate that usage of an effector dependent representation is supported by the observation of identical chunking patterns between the Normal and Motor conditions, and the lack of similarity in chunking patterns between the Normal and Visual conditions

eScholarship - University of California

CogPrints Cognitive Sciences Eprint Archive

Self-organization of action hierarchy and compositionality by reinforcement learning with recurrent neural networks

Author: Doya Kenji
Han Dongqi
Tani Jun
Publication venue
Publication date: 26/11/2019
Field of study

Recurrent neural networks (RNNs) for reinforcement learning (RL) have shown distinct advantages, e.g., solving memory-dependent tasks and meta-learning. However, little effort has been spent on improving RNN architectures and on understanding the underlying neural mechanisms for performance gain. In this paper, we propose a novel, multiple-timescale, stochastic RNN for RL. Empirical results show that the network can autonomously learn to abstract sub-goals and can self-develop an action hierarchy using internal dynamics in a challenging continuous control task. Furthermore, we show that the self-developed compositionality of the network enhances faster re-learning when adapting to a new task that is a re-composition of previously learned sub-goals, than when starting from scratch. We also found that improved performance can be achieved when neural activities are subject to stochastic rather than deterministic dynamics

arXiv.org e-Print Archive

OIST Institutional Repository

Toward evolutionary and developmental intelligence

Author: Kenji Doya
Tadahiro Taniguchi
Publication venue: 'Elsevier BV'
Publication date: 30/05/2019
Field of study

Given the phenomenal advances in artificial intelligence in specific domains like visual object recognition and game playing by deep learning, expectations are rising for building artificial general intelligence (AGI) that can flexibly find solutions in unknown task domains. One approach to AGI is to set up a variety of tasks and design AI agents that perform well in many of them, including those the agent faces for the first time. One caveat for such an approach is that the best performing agent may be just a collection of domain-specific AI agents switched for a given domain. Here we propose an alternative approach of focusing on the process of acquisition of intelligence through active interactions in an environment. We call this approach evolutionary and developmental intelligence (EDI). We first review the current status of artificial intelligence, brain-inspired computing and developmental robotics and define the conceptual framework of EDI. We then explore how we can integrate advances in neuroscience, machine learning, and robotics to construct EDI systems and how building such systems can help us understand animal and human intelligence

OIST Institutional Repository

Institutional Repositories DataBase (IRDB)

Numerical Data Imputation for Multimodal Data Sets: A Probabilistic Nearest-Neighbor Kernel Density Approach

Author: Doya Kenji
Lalande Floria
Publication venue
Publication date: 29/06/2023
Field of study

Numerical data imputation algorithms replace missing values by estimates to leverage incomplete data sets. Current imputation methods seek to minimize the error between the unobserved ground truth and the imputed values. But this strategy can create artifacts leading to poor imputation in the presence of multimodal or complex distributions. To tackle this problem, we introduce the

k

\times

KDE algorithm: a data imputation method combining nearest neighbor estimation (

k

NN) and density estimation with Gaussian kernels (KDE). We compare our method with previous data imputation methods using artificial and real-world data with different data missing scenarios and various data missing rates, and show that our method can cope with complex original data structure, yields lower data imputation errors, and provides probabilistic estimates with higher likelihood than current methods. We release the code in open-source for the community: https://github.com/DeltaFloflo/knnxkdeComment: 30 pages, 8 figures, accepted in TMLR (Reproducibility certification

arXiv.org e-Print Archive

Constrained Reinforcement Learning from Intrinsic and Extrinsic Rewards

Author: Eiji Uchibe
Kenji Doya
Publication venue: 'IntechOpen'
Publication date: 01/01/2009
Field of study

IntechOpen

Imitation learning based on entropy-regularized forward and inverse reinforcement learning

Author: Doya Kenji
Uchibe Eiji
Publication venue
Publication date: 17/08/2020
Field of study

This paper proposes Entropy-Regularized Imitation Learning (ERIL), which is a combination of forward and inverse reinforcement learning under the framework of the entropy-regularized Markov decision process. ERIL minimizes the reverse Kullback-Leibler (KL) divergence between two probability distributions induced by a learner and an expert. Inverse reinforcement learning (RL) in ERIL evaluates the log-ratio between two distributions using the density ratio trick, which is widely used in generative adversarial networks. More specifically, the log-ratio is estimated by building two binary discriminators. The first discriminator is a state-only function, and it tries to distinguish the state generated by the forward RL step from the expert's state. The second discriminator is a function of current state, action, and transitioned state, and it distinguishes the generated experiences from the ones provided by the expert. Since the second discriminator has the same hyperparameters of the forward RL step, it can be used to control the discriminator's ability. The forward RL minimizes the reverse KL estimated by the inverse RL. We show that minimizing the reverse KL divergence is equivalent to finding an optimal policy under entropy regularization. Consequently, a new policy is derived from an algorithm that resembles Dynamic Policy Programming and Soft Actor-Critic. Our experimental results on MuJoCo-simulated environments show that ERIL is more sample-efficient than such previous methods. We further apply the method to human behaviors in performing a pole-balancing task and show that the estimated reward functions show how every subject achieves the goal.Comment: 33 pages, 10 figure

arXiv.org e-Print Archive

OIST Institutional Repository

Variational Recurrent Models for Solving Partially Observable Control Tasks

Author: Doya Kenji
Han Dongqi
Tani Jun
Publication venue
Publication date: 26/09/2019
Field of study

In partially observable (PO) environments, deep reinforcement learning (RL) agents often suffer from unsatisfactory performance, since two problems need to be tackled together: how to extract information from the raw observations to solve the task, and how to improve the policy. In this study, we propose an RL algorithm for solving PO tasks. Our method comprises two parts: a variational recurrent model (VRM) for modeling the environment, and an RL controller that has access to both the environment and the VRM. The proposed algorithm was tested in two types of PO robotic control tasks, those in which either coordinates or velocities were not observable and those that require long-term memorization. Our experiments show that the proposed algorithm achieved better data efficiency and/or learned more optimal policy than other alternative approaches in tasks in which unobserved states cannot be inferred from raw observations in a simple manner.Comment: Published as a conference paper at the Eighth International Conference on Learning Representations (ICLR 2020

arXiv.org e-Print Archive

OIST Institutional Repository