1,502 research outputs found
In Search of the Neural Circuits of Intrinsic Motivation
Children seem to acquire new know-how in a continuous and open-ended manner. In this paper, we hypothesize that an intrinsic motivation to progress in learning is at the origins of the remarkable structure of children's developmental trajectories. In this view, children engage in exploratory and playful activities for their own sake, not as steps toward other extrinsic goals. The central hypothesis of this paper is that intrinsically motivating activities correspond to expected decrease in prediction error. This motivation system pushes the infant to avoid both predictable and unpredictable situations in order to focus on the ones that are expected to maximize progress in learning. Based on a computational model and a series of robotic experiments, we show how this principle can lead to organized sequences of behavior of increasing complexity characteristic of several behavioral and developmental patterns observed in humans. We then discuss the putative circuitry underlying such an intrinsic motivation system in the brain and formulate two novel hypotheses. The first one is that tonic dopamine acts as a learning progress signal. The second is that this progress signal is directly computed through a hierarchy of microcortical circuits that act both as prediction and metaprediction systems
Adaptive and learning-based formation control of swarm robots
Autonomous aerial and wheeled mobile robots play a major role in tasks such as search and rescue, transportation, monitoring, and inspection. However, these operations are faced with a few open challenges including robust autonomy, and adaptive coordination based on the environment and operating conditions, particularly in swarm robots with limited communication and perception capabilities. Furthermore, the computational complexity increases exponentially with the number of robots in the swarm. This thesis examines two different aspects of the formation control problem. On the one hand, we investigate how formation could be performed by swarm robots with limited communication and perception (e.g., Crazyflie nano quadrotor). On the other hand, we explore human-swarm interaction (HSI) and different shared-control mechanisms between human and swarm robots (e.g., BristleBot) for artistic creation. In particular, we combine bio-inspired (i.e., flocking, foraging) techniques with learning-based control strategies (using artificial neural networks) for adaptive control of multi- robots. We first review how learning-based control and networked dynamical systems can be used to assign distributed and decentralized policies to individual robots such that the desired formation emerges from their collective behavior. We proceed by presenting a novel flocking control for UAV swarm using deep reinforcement learning. We formulate the flocking formation problem as a partially observable Markov decision process (POMDP), and consider a leader-follower configuration, where consensus among all UAVs is used to train a shared control policy, and each UAV performs actions based on the local information it collects. In addition, to avoid collision among UAVs and guarantee flocking and navigation, a reward function is added with the global flocking maintenance, mutual reward, and a collision penalty. We adapt deep deterministic policy gradient (DDPG) with centralized training and decentralized execution to obtain the flocking control policy using actor-critic networks and a global state space matrix. In the context of swarm robotics in arts, we investigate how the formation paradigm can serve as an interaction modality for artists to aesthetically utilize swarms. In particular, we explore particle swarm optimization (PSO) and random walk to control the communication between a team of robots with swarming behavior for musical creation
Self-organisation of internal models in autonomous robots
Internal Models (IMs) play a significant role in autonomous robotics. They are mechanisms
able to represent the input-output characteristics of the sensorimotor loop. In
developmental robotics, open-ended learning of skills and knowledge serves the purpose
of reaction to unexpected inputs, to explore the environment and to acquire new
behaviours. The development of the robot includes self-exploration of the state-action
space and learning of the environmental dynamics.
In this dissertation, we explore the properties and benefits of the self-organisation
of robot behaviour based on the homeokinetic learning paradigm. A homeokinetic
robot explores the environment in a coherent way without prior knowledge of its
configuration or the environment itself. First, we propose a novel approach to self-organisation
of behaviour by artificial curiosity in the sensorimotor loop. Second, we
study how different forward models settings alter the behaviour of both exploratory
and goal-oriented robots. Diverse complexity, size and learning rules are compared
to assess the importance in the robotâs exploratory behaviour. We define the self-organised
behaviour performance in terms of simultaneous environment coverage and
best prediction of future sensori inputs. Among the findings, we have encountered
that models with a fast response and a minimisation of the prediction error by local
gradients achieve the best performance.
Third, we study how self-organisation of behaviour can be exploited to learn IMs
for goal-oriented tasks. An IM acquires coherent self-organised behaviours that are
then used to achieve high-level goals by reinforcement learning (RL). Our results
demonstrate that learning of an inverse model in this context yields faster reward maximisation
and a higher final reward. We show that an initial exploration of the environment
in a goal-less yet coherent way improves learning.
In the same context, we analyse the self-organisation of central pattern generators
(CPG) by reward maximisation. Our results show that CPGs can learn favourable
reward behaviour on high-dimensional robots using the self-organised interaction between
degrees of freedom. Finally, we examine an on-line dual control architecture
where we combine an Actor-Critic RL and the homeokinetic controller. With this
configuration, the probing signal is generated by the exertion of the embodied robot
experience with the environment. This set-up solves the problem of designing task-dependant
probing signals by the emergence of intrinsically motivated comprehensible
behaviour. Faster improvement of the reward signal compared to classic RL is
achievable with this configuration
Recommended from our members
Functional organisation of behavioural inhibitory control mechanisms in cortico-basal ganglia circuitry: implications for stimulant use disorder.
The neural and psychological mechanisms of inhibitory control processes were investigated, focusing on the cortico-basal ganglia circuits in rats and humans. These included behavioural flexibility, âwaitingâ and âstoppingâ impulsivity and involved serial spatial reversal learning task in rodents, and in humans, premature responses in the Monetary Incentive Delay (MID) task and the stop-signal reaction time task. Chapter 2 and Chapter 3 focus on individual differences in behavioural flexibility in rats while Chapter 4, Chapter 5 and Chapter 6 consider how inhibitory control mechanisms are affected by the psychostimulant drug cocaine in both rats and humans.
As reported in Chapter 2, systemic modulation of monoaminergic transmission by monoamine oxidase A (MAO-A) inhibitors enhanced reversal learning performance, selectively by decreasing the lose-shift probability, thereby implicating a role for dopamine, serotonin and noradrenaline in facilitating learning from negative feedback. Resting state functional magnetic resonance imaging (fMRI) revealed enhanced functional connectivity of the orbitofrontal and motor cortices as a correlate of flexible reversal learning performance, consistent with elevated levels of monoamines in these region (Chapter 3). Having clarified the mechanisms underlying behavioural flexibility in rats, Chapter 4 reports that escalation of intravenous cocaine self-administration induces behavioural inflexibility in rats even after a relatively short period of cocaine intake. Computational models, including a reinforced and Bayesian learner, revealed a lack of exploitation of the learned response-outcome relationships in cocaine-exposed rats.
Chapter 5 focused on impulse control in human volunteers, identifying the striatal and cingulo-opercular networks as substrates of impulsive, premature responding in healthy
4
volunteers, stimulant-dependent individuals and their unaffected siblings. Loss of impulse control was elicited by different incentives for drug-free participants as opposed to drug users. Drug cues elicited striatal activation and increased premature responses in the stimulant-dependent group compared with the control group. In contrast, the ventral striatum was linked to incentive specific activation to reward anticipation. Task-based fMRI demonstrated that interactions between dorsal striatum and cingulo-opercular âcold cognitionâ networks underlie failures of impulse control in the control, at-risk and stimulant-dependent groups. However, whereas the cingulo-opercular networks were associated with premature responding in all groups, the reward system was activated specifically by the drug incentive cues in the stimulant group, and by monetary incentive cues in the drug-free groups.
Chapter 6 presents evidence that corticostriatal functional and effective connectivity in an overlapping network that includes the anterior cingulate and inferior frontal cortices as well as motor cortex, the subthalamic nucleus and dorsal striatum, is critical to stopping impulse control in both control and cocaine individuals. No stopping efficiency impairments were observed in the cocaine-dependent group. Nevertheless, lower structural corticostriatal connectivity measured using diffusion MRI was associated with response execution impairments in cocaine participants performing a stop-signal reaction time task. Further, response execution was rescued by the selective noradrenaline reuptake inhibitor atomoxetine, which also increased corticostriatal effective connectivity.
Finally, increased impulsivity and behavioural inflexibility seen in stimulant use disorder in Chapter 5 and Chapter 4, respectively, were not observed in the endophenotype at risk for developing stimulant abuse but were rather a consequence of stimulant abuse. These results further clarify the monoaminergic substrates of behavioural flexibility and specify the neural and computational impairments in inhibitory control induced by stimulant dependence.Pinsent Darwin Studentship from the Dept of Physiology, Development and Neuroscienc
Prospective Coding by Spiking Neurons
Animals learn to make predictions, such as associating the sound of a bell with upcoming feeding or predicting a movement that a motor command is eliciting. How predictions are realized on the neuronal level and what plasticity rule underlies their learning is not well understood. Here we propose a biologically plausible synaptic plasticity rule to learn predictions on a single neuron level on a timescale of seconds. The learning rule allows a spiking two-compartment neuron to match its current firing rate to its own expected future discounted firing rate. For instance, if an originally neutral event is repeatedly followed by an event that elevates the firing rate of a neuron, the originally neutral event will eventually also elevate the neuron's firing rate. The plasticity rule is a form of spike timing dependent plasticity in which a presynaptic spike followed by a postsynaptic spike leads to potentiation. Even if the plasticity window has a width of 20 milliseconds, associations on the time scale of seconds can be learned. We illustrate prospective coding with three examples: learning to predict a time varying input, learning to predict the next stimulus in a delayed paired-associate task and learning with a recurrent network to reproduce a temporally compressed version of a sequence. We discuss the potential role of the learning mechanism in classical trace conditioning. In the special case that the signal to be predicted encodes reward, the neuron learns to predict the discounted future reward and learning is closely related to the temporal difference learning algorithm TD(λ)
Can Control Hierarchies be Developed and Optimised Progressively?
Hierarchical structures are used in robots to achieve effective results in control problems. Hierarchical structures are found in a wide array of applications of AI and robotics, making them a key aspect of control. Even though they hold an integral part in control, such structures are typically produced heuristically, resulting in inconsistent performance. This means that effective control tasks or controllers perform poorly due to the hierarchy being badly defined, limiting what controllers can do. Complex control problems that require adaptive behaviour or autonomy remain challenging for control theorists, with complex problem domains making the heuristic process of producing complex hierarchies harder.
It is evident that the heuristic process must have some form of procedure that could be turned into a methodology. By formalising or automating this process, control hierarchies can be produced with consistently effective results without relying on the heuristic production of a control engineer which can easily fail. This thesis proposes an algorithmic approach (inspired by Perceptual Control Theory) known as \ac{DOSA}. \ac{DOSA} produces heirarchies automatically using real world experience and the inputs the system has access to. This thesis shows that DOSA consistently reproduces effective hierarchies that exist in the literature, when billions of possible hierarchies were available.
Furthermore, this thesis investigates the value of using hierarchies in general and their benefits in control problems. The computational complexity of hierarchies is compared, showing that while hierarchies do not have a computational advantage, the parameter optimisation procedure is aided greatly by hierarchical parameter optimisation. The thesis then proceeds to study th hierarchical optimisation of parameters and how hierarchies allow this process to be performed more consistently for better results, concluding that hierarchical parameter optimisation produces more consistent controllers that also transfer better to an unseen problem domain. Parameter optimisation is a challenge that also limits otherwise effective controllers and limits the use of larger structures in control.
The research described in this thesis formalises the process of generating hierarchical controllers as well as hierarchically optimising them, providing a comprehensive methodology to automate the production of robust controllers for complex problems
Quantum Economic Theory of Intelligence
The Quantum Economics Intelligence Initiative, spearheaded by Quantum Economist PhDs. Kaiola M Liu integrates insights from seminal thinkers like Einstein, Archimedes, Adam Smith, Nick Land, and Sun Tzu. By applying principles of quantum mechanics, this forward-looking project aims to redefine economic modeling, exploring real-world applications and potential benefits. The initiative encompasses foundational studies, economic model applications, incorporation of quantum computing, and analysis of contemporary economic philosophies. Keywords - Quantum Mechanics, Economics, Technological Advancements, Philosophy
- âŠ