309 research outputs found
Canonical Cortical Circuits and the Duality of Bayesian Inference and Optimal Control
The duality of sensory inference and motor control has been known since the
1960s and has recently been recognized as the commonality in computations
required for the posterior distributions in Bayesian inference and the value
functions in optimal control. Meanwhile, an intriguing question about the brain
is why the entire neocortex shares a canonical six-layer architecture while its
posterior and anterior halves are engaged in sensory processing and motor
control, respectively. Here we consider the hypothesis that the sensory and
motor cortical circuits implement the dual computations for Bayesian inference
and optimal control, or perceptual and value-based decision making,
respectively. We first review the classic duality of inference and control in
linear quadratic systems and then review the correspondence between dynamic
Bayesian inference and optimal control. Based on the architecture of the
canonical cortical circuit, we explore how different cortical neurons may
represent variables and implement computations.Comment: 13 pages, 3 figur
Chunking Patterns Reflect Effector-dependent Representation of Motor Sequence
Sequential organization is central to much of human intelligent behavior ranging from everyday skills such as lacing shoes to using a computer. It is well known that such sequential skills involve chaining a number of primitive actions together. A robust representation of skills can be formed by chunking together several elements of a sequence. We demonstrate, using a 2x6 finger movement task, that during the process of acquiring visuomotor skills the chunking patterns remained unaltered when utilizing an effector dependent representation of the sequence. In the 2x6 task, subjects learned a sequence
of 12 visual cues displayed as six sets of two elements each
and performed finger movements on a keypad. Two experiments
Normal-Motor and Normal-Visual were conducted on nine subjects and two observations were collected from each
subject. Each experiment consisted of a Normal and a Rotated
condition. In the Rotated (Motor and Visual) conditions, subjects were required to rotate the visual cues by 180 degrees and press the corresponding keys. The display sequence was also rotated for the Motor condition, requiring an identical set of effector movements to be performed as in the Normal condition. Chunking patterns were identified using the response times (RTs) for individual sets of the sequence. A pause between
set RTs demarcates an ensuing chunk. We demonstrate
that usage of an effector dependent representation is supported by the observation of identical chunking patterns between the Normal and Motor conditions, and the lack of similarity in chunking patterns between the Normal and Visual conditions
Self-organization of action hierarchy and compositionality by reinforcement learning with recurrent neural networks
Recurrent neural networks (RNNs) for reinforcement learning (RL) have shown
distinct advantages, e.g., solving memory-dependent tasks and meta-learning.
However, little effort has been spent on improving RNN architectures and on
understanding the underlying neural mechanisms for performance gain. In this
paper, we propose a novel, multiple-timescale, stochastic RNN for RL. Empirical
results show that the network can autonomously learn to abstract sub-goals and
can self-develop an action hierarchy using internal dynamics in a challenging
continuous control task. Furthermore, we show that the self-developed
compositionality of the network enhances faster re-learning when adapting to a
new task that is a re-composition of previously learned sub-goals, than when
starting from scratch. We also found that improved performance can be achieved
when neural activities are subject to stochastic rather than deterministic
dynamics
Toward evolutionary and developmental intelligence
Given the phenomenal advances in artificial intelligence in specific domains like visual object recognition and game playing by deep learning, expectations are rising for building artificial general intelligence (AGI) that can flexibly find solutions in unknown task domains. One approach to AGI is to set up a variety of tasks and design AI agents that perform well in many of them, including those the agent faces for the first time. One caveat for such an approach is that the best performing agent may be just a collection of domain-specific AI agents switched for a given domain. Here we propose an alternative approach of focusing on the process of acquisition of intelligence through active interactions in an environment. We call this approach evolutionary and developmental intelligence (EDI). We first review the current status of artificial intelligence, brain-inspired computing and developmental robotics and define the conceptual framework of EDI. We then explore how we can integrate advances in neuroscience, machine learning, and robotics to construct EDI systems and how building such systems can help us understand animal and human intelligence
Numerical Data Imputation for Multimodal Data Sets: A Probabilistic Nearest-Neighbor Kernel Density Approach
Numerical data imputation algorithms replace missing values by estimates to
leverage incomplete data sets. Current imputation methods seek to minimize the
error between the unobserved ground truth and the imputed values. But this
strategy can create artifacts leading to poor imputation in the presence of
multimodal or complex distributions. To tackle this problem, we introduce the
NNKDE algorithm: a data imputation method combining nearest neighbor
estimation (NN) and density estimation with Gaussian kernels (KDE). We
compare our method with previous data imputation methods using artificial and
real-world data with different data missing scenarios and various data missing
rates, and show that our method can cope with complex original data structure,
yields lower data imputation errors, and provides probabilistic estimates with
higher likelihood than current methods. We release the code in open-source for
the community: https://github.com/DeltaFloflo/knnxkdeComment: 30 pages, 8 figures, accepted in TMLR (Reproducibility certification
Imitation learning based on entropy-regularized forward and inverse reinforcement learning
This paper proposes Entropy-Regularized Imitation Learning (ERIL), which is a
combination of forward and inverse reinforcement learning under the framework
of the entropy-regularized Markov decision process. ERIL minimizes the reverse
Kullback-Leibler (KL) divergence between two probability distributions induced
by a learner and an expert. Inverse reinforcement learning (RL) in ERIL
evaluates the log-ratio between two distributions using the density ratio
trick, which is widely used in generative adversarial networks. More
specifically, the log-ratio is estimated by building two binary discriminators.
The first discriminator is a state-only function, and it tries to distinguish
the state generated by the forward RL step from the expert's state. The second
discriminator is a function of current state, action, and transitioned state,
and it distinguishes the generated experiences from the ones provided by the
expert. Since the second discriminator has the same hyperparameters of the
forward RL step, it can be used to control the discriminator's ability. The
forward RL minimizes the reverse KL estimated by the inverse RL. We show that
minimizing the reverse KL divergence is equivalent to finding an optimal policy
under entropy regularization. Consequently, a new policy is derived from an
algorithm that resembles Dynamic Policy Programming and Soft Actor-Critic. Our
experimental results on MuJoCo-simulated environments show that ERIL is more
sample-efficient than such previous methods. We further apply the method to
human behaviors in performing a pole-balancing task and show that the estimated
reward functions show how every subject achieves the goal.Comment: 33 pages, 10 figure
Variational Recurrent Models for Solving Partially Observable Control Tasks
In partially observable (PO) environments, deep reinforcement learning (RL)
agents often suffer from unsatisfactory performance, since two problems need to
be tackled together: how to extract information from the raw observations to
solve the task, and how to improve the policy. In this study, we propose an RL
algorithm for solving PO tasks. Our method comprises two parts: a variational
recurrent model (VRM) for modeling the environment, and an RL controller that
has access to both the environment and the VRM. The proposed algorithm was
tested in two types of PO robotic control tasks, those in which either
coordinates or velocities were not observable and those that require long-term
memorization. Our experiments show that the proposed algorithm achieved better
data efficiency and/or learned more optimal policy than other alternative
approaches in tasks in which unobserved states cannot be inferred from raw
observations in a simple manner.Comment: Published as a conference paper at the Eighth International
Conference on Learning Representations (ICLR 2020
- …