22,004 research outputs found
Adaptation to criticality through organizational invariance in embodied agents
Many biological and cognitive systems do not operate deep within one or other
regime of activity. Instead, they are poised at critical points located at
phase transitions in their parameter space. The pervasiveness of criticality
suggests that there may be general principles inducing this behaviour, yet
there is no well-founded theory for understanding how criticality is generated
at a wide span of levels and contexts. In order to explore how criticality
might emerge from general adaptive mechanisms, we propose a simple learning
rule that maintains an internal organizational structure from a specific family
of systems at criticality. We implement the mechanism in artificial embodied
agents controlled by a neural network maintaining a correlation structure
randomly sampled from an Ising model at critical temperature. Agents are
evaluated in two classical reinforcement learning scenarios: the Mountain Car
and the Acrobot double pendulum. In both cases the neural controller appears to
reach a point of criticality, which coincides with a transition point between
two regimes of the agent's behaviour. These results suggest that adaptation to
criticality could be used as a general adaptive mechanism in some
circumstances, providing an alternative explanation for the pervasive presence
of criticality in biological and cognitive systems.Comment: arXiv admin note: substantial text overlap with arXiv:1704.0525
Structured chaos shapes spike-response noise entropy in balanced neural networks
Large networks of sparsely coupled, excitatory and inhibitory cells occur
throughout the brain. A striking feature of these networks is that they are
chaotic. How does this chaos manifest in the neural code? Specifically, how
variable are the spike patterns that such a network produces in response to an
input signal? To answer this, we derive a bound for the entropy of multi-cell
spike pattern distributions in large recurrent networks of spiking neurons
responding to fluctuating inputs. The analysis is based on results from random
dynamical systems theory and is complimented by detailed numerical simulations.
We find that the spike pattern entropy is an order of magnitude lower than what
would be extrapolated from single cells. This holds despite the fact that
network coupling becomes vanishingly sparse as network size grows -- a
phenomenon that depends on ``extensive chaos," as previously discovered for
balanced networks without stimulus drive. Moreover, we show how spike pattern
entropy is controlled by temporal features of the inputs. Our findings provide
insight into how neural networks may encode stimuli in the presence of
inherently chaotic dynamics.Comment: 9 pages, 5 figure
A Hierarchical Latent Variable Encoder-Decoder Model for Generating Dialogues
Sequential data often possesses a hierarchical structure with complex
dependencies between subsequences, such as found between the utterances in a
dialogue. In an effort to model this kind of generative process, we propose a
neural network-based generative architecture, with latent stochastic variables
that span a variable number of time steps. We apply the proposed model to the
task of dialogue response generation and compare it with recent neural network
architectures. We evaluate the model performance through automatic evaluation
metrics and by carrying out a human evaluation. The experiments demonstrate
that our model improves upon recently proposed models and that the latent
variables facilitate the generation of long outputs and maintain the context.Comment: 15 pages, 5 tables, 4 figure
BowTie - A deep learning feedforward neural network for sentiment analysis
How to model and encode the semantics of human-written text and select the
type of neural network to process it are not settled issues in sentiment
analysis. Accuracy and transferability are critical issues in machine learning
in general. These properties are closely related to the loss estimates for the
trained model. I present a computationally-efficient and accurate feedforward
neural network for sentiment prediction capable of maintaining low losses. When
coupled with an effective semantics model of the text, it provides highly
accurate models with low losses. Experimental results on representative
benchmark datasets and comparisons to other methods show the advantages of the
new approach.Comment: 12 pages, 7 figures, 4 table
Cultural Neuroeconomics of Intertemporal Choice
According to theories of cultural neuroscience, Westerners and Easterners may have distinct styles of cognition (e.g., different allocation of attention). Previous research has shown that Westerners and Easterners tend to utilize analytical and holistic cognitive styles, respectively. On the other hand, little is known regarding the cultural differences in neuroeconomic behavior. For instance, economic decisions may be affected by cultural differences in neurocomputational processing underlying attention; however, this area of neuroeconomics has been largely understudied. In the present paper, we attempt to bridge this gap by considering the links between the theory of cultural neuroscience and neuroeconomic theory\ud
of the role of attention in intertemporal choice. We predict that (i) Westerners are more impulsive and inconsistent in intertemporal choice in comparison to Easterners, and (ii) Westerners more steeply discount delayed monetary losses than Easterners. We examine these predictions by utilizing a novel temporal discounting model based on Tsallis' statistics (i.e. a q-exponential model). Our preliminary analysis of temporal discounting of gains and losses by Americans and Japanese confirmed the predictions from the cultural neuroeconomic theory. Future study directions, employing computational modeling via neural networks, are briefly outlined and discussed
Recommended from our members
Understanding Model-Based Reinforcement Learning and its Application in Safe Reinforcement Learning
Model-based reinforcement learning algorithms have been shown to achieve successful results on various continuous control benchmarks, but the understanding of model-based methods is limited. We try to interpret how model-based method works through novel experiments on state-of-the-art algorithms with an emphasis on the model learning part. We evaluate the role of the model learning in policy optimization and propose methods to learn a more accurate model. With a better understanding of model-based reinforcement learning, we then apply model-based methods to solve safe reinforcement learning (RL) problems with near-zero violation of hard constraints throughout training. Drawing an analogy with how humans and animals learn to perform safe actions, we break down the safe RL problem into three stages. First, we train agents in a constraint-free environment to learn a performant policy for reaching high rewards, and simultaneously learn a model of the dynamics. Second, we use model-based methods to plan safe actions and train a safeguarding policy from these actions through imitation. Finally, we propose a factored framework to train an overall policy that mixes the performant policy and the safeguarding policy. This three-step curriculum ensures near-zero violation of safety constraints at all times. As an advantage of model-based method, the sample complexity required at the second and third steps of the process is significantly lower than model-free methods and can enable online safe learning. We demonstrate the effectiveness of our methods in various continuous control problems and analyze the advantages over state-of-the-art approaches
Linear combination of one-step predictive information with an external reward in an episodic policy gradient setting: a critical analysis
One of the main challenges in the field of embodied artificial intelligence
is the open-ended autonomous learning of complex behaviours. Our approach is to
use task-independent, information-driven intrinsic motivation(s) to support
task-dependent learning. The work presented here is a preliminary step in which
we investigate the predictive information (the mutual information of the past
and future of the sensor stream) as an intrinsic drive, ideally supporting any
kind of task acquisition. Previous experiments have shown that the predictive
information (PI) is a good candidate to support autonomous, open-ended learning
of complex behaviours, because a maximisation of the PI corresponds to an
exploration of morphology- and environment-dependent behavioural regularities.
The idea is that these regularities can then be exploited in order to solve any
given task. Three different experiments are presented and their results lead to
the conclusion that the linear combination of the one-step PI with an external
reward function is not generally recommended in an episodic policy gradient
setting. Only for hard tasks a great speed-up can be achieved at the cost of an
asymptotic performance lost
- …