1,502 research outputs found

    In Search of the Neural Circuits of Intrinsic Motivation

    Get PDF
    Children seem to acquire new know-how in a continuous and open-ended manner. In this paper, we hypothesize that an intrinsic motivation to progress in learning is at the origins of the remarkable structure of children's developmental trajectories. In this view, children engage in exploratory and playful activities for their own sake, not as steps toward other extrinsic goals. The central hypothesis of this paper is that intrinsically motivating activities correspond to expected decrease in prediction error. This motivation system pushes the infant to avoid both predictable and unpredictable situations in order to focus on the ones that are expected to maximize progress in learning. Based on a computational model and a series of robotic experiments, we show how this principle can lead to organized sequences of behavior of increasing complexity characteristic of several behavioral and developmental patterns observed in humans. We then discuss the putative circuitry underlying such an intrinsic motivation system in the brain and formulate two novel hypotheses. The first one is that tonic dopamine acts as a learning progress signal. The second is that this progress signal is directly computed through a hierarchy of microcortical circuits that act both as prediction and metaprediction systems

    Adaptive and learning-based formation control of swarm robots

    Get PDF
    Autonomous aerial and wheeled mobile robots play a major role in tasks such as search and rescue, transportation, monitoring, and inspection. However, these operations are faced with a few open challenges including robust autonomy, and adaptive coordination based on the environment and operating conditions, particularly in swarm robots with limited communication and perception capabilities. Furthermore, the computational complexity increases exponentially with the number of robots in the swarm. This thesis examines two different aspects of the formation control problem. On the one hand, we investigate how formation could be performed by swarm robots with limited communication and perception (e.g., Crazyflie nano quadrotor). On the other hand, we explore human-swarm interaction (HSI) and different shared-control mechanisms between human and swarm robots (e.g., BristleBot) for artistic creation. In particular, we combine bio-inspired (i.e., flocking, foraging) techniques with learning-based control strategies (using artificial neural networks) for adaptive control of multi- robots. We first review how learning-based control and networked dynamical systems can be used to assign distributed and decentralized policies to individual robots such that the desired formation emerges from their collective behavior. We proceed by presenting a novel flocking control for UAV swarm using deep reinforcement learning. We formulate the flocking formation problem as a partially observable Markov decision process (POMDP), and consider a leader-follower configuration, where consensus among all UAVs is used to train a shared control policy, and each UAV performs actions based on the local information it collects. In addition, to avoid collision among UAVs and guarantee flocking and navigation, a reward function is added with the global flocking maintenance, mutual reward, and a collision penalty. We adapt deep deterministic policy gradient (DDPG) with centralized training and decentralized execution to obtain the flocking control policy using actor-critic networks and a global state space matrix. In the context of swarm robotics in arts, we investigate how the formation paradigm can serve as an interaction modality for artists to aesthetically utilize swarms. In particular, we explore particle swarm optimization (PSO) and random walk to control the communication between a team of robots with swarming behavior for musical creation

    Self-organisation of internal models in autonomous robots

    Get PDF
    Internal Models (IMs) play a significant role in autonomous robotics. They are mechanisms able to represent the input-output characteristics of the sensorimotor loop. In developmental robotics, open-ended learning of skills and knowledge serves the purpose of reaction to unexpected inputs, to explore the environment and to acquire new behaviours. The development of the robot includes self-exploration of the state-action space and learning of the environmental dynamics. In this dissertation, we explore the properties and benefits of the self-organisation of robot behaviour based on the homeokinetic learning paradigm. A homeokinetic robot explores the environment in a coherent way without prior knowledge of its configuration or the environment itself. First, we propose a novel approach to self-organisation of behaviour by artificial curiosity in the sensorimotor loop. Second, we study how different forward models settings alter the behaviour of both exploratory and goal-oriented robots. Diverse complexity, size and learning rules are compared to assess the importance in the robot’s exploratory behaviour. We define the self-organised behaviour performance in terms of simultaneous environment coverage and best prediction of future sensori inputs. Among the findings, we have encountered that models with a fast response and a minimisation of the prediction error by local gradients achieve the best performance. Third, we study how self-organisation of behaviour can be exploited to learn IMs for goal-oriented tasks. An IM acquires coherent self-organised behaviours that are then used to achieve high-level goals by reinforcement learning (RL). Our results demonstrate that learning of an inverse model in this context yields faster reward maximisation and a higher final reward. We show that an initial exploration of the environment in a goal-less yet coherent way improves learning. In the same context, we analyse the self-organisation of central pattern generators (CPG) by reward maximisation. Our results show that CPGs can learn favourable reward behaviour on high-dimensional robots using the self-organised interaction between degrees of freedom. Finally, we examine an on-line dual control architecture where we combine an Actor-Critic RL and the homeokinetic controller. With this configuration, the probing signal is generated by the exertion of the embodied robot experience with the environment. This set-up solves the problem of designing task-dependant probing signals by the emergence of intrinsically motivated comprehensible behaviour. Faster improvement of the reward signal compared to classic RL is achievable with this configuration

    Prospective Coding by Spiking Neurons

    Get PDF
    Animals learn to make predictions, such as associating the sound of a bell with upcoming feeding or predicting a movement that a motor command is eliciting. How predictions are realized on the neuronal level and what plasticity rule underlies their learning is not well understood. Here we propose a biologically plausible synaptic plasticity rule to learn predictions on a single neuron level on a timescale of seconds. The learning rule allows a spiking two-compartment neuron to match its current firing rate to its own expected future discounted firing rate. For instance, if an originally neutral event is repeatedly followed by an event that elevates the firing rate of a neuron, the originally neutral event will eventually also elevate the neuron's firing rate. The plasticity rule is a form of spike timing dependent plasticity in which a presynaptic spike followed by a postsynaptic spike leads to potentiation. Even if the plasticity window has a width of 20 milliseconds, associations on the time scale of seconds can be learned. We illustrate prospective coding with three examples: learning to predict a time varying input, learning to predict the next stimulus in a delayed paired-associate task and learning with a recurrent network to reproduce a temporally compressed version of a sequence. We discuss the potential role of the learning mechanism in classical trace conditioning. In the special case that the signal to be predicted encodes reward, the neuron learns to predict the discounted future reward and learning is closely related to the temporal difference learning algorithm TD(λ)

    Can Control Hierarchies be Developed and Optimised Progressively?

    Get PDF
    Hierarchical structures are used in robots to achieve effective results in control problems. Hierarchical structures are found in a wide array of applications of AI and robotics, making them a key aspect of control. Even though they hold an integral part in control, such structures are typically produced heuristically, resulting in inconsistent performance. This means that effective control tasks or controllers perform poorly due to the hierarchy being badly defined, limiting what controllers can do. Complex control problems that require adaptive behaviour or autonomy remain challenging for control theorists, with complex problem domains making the heuristic process of producing complex hierarchies harder. It is evident that the heuristic process must have some form of procedure that could be turned into a methodology. By formalising or automating this process, control hierarchies can be produced with consistently effective results without relying on the heuristic production of a control engineer which can easily fail. This thesis proposes an algorithmic approach (inspired by Perceptual Control Theory) known as \ac{DOSA}. \ac{DOSA} produces heirarchies automatically using real world experience and the inputs the system has access to. This thesis shows that DOSA consistently reproduces effective hierarchies that exist in the literature, when billions of possible hierarchies were available. Furthermore, this thesis investigates the value of using hierarchies in general and their benefits in control problems. The computational complexity of hierarchies is compared, showing that while hierarchies do not have a computational advantage, the parameter optimisation procedure is aided greatly by hierarchical parameter optimisation. The thesis then proceeds to study th hierarchical optimisation of parameters and how hierarchies allow this process to be performed more consistently for better results, concluding that hierarchical parameter optimisation produces more consistent controllers that also transfer better to an unseen problem domain. Parameter optimisation is a challenge that also limits otherwise effective controllers and limits the use of larger structures in control. The research described in this thesis formalises the process of generating hierarchical controllers as well as hierarchically optimising them, providing a comprehensive methodology to automate the production of robust controllers for complex problems

    Quantum Economic Theory of Intelligence

    Get PDF
    The Quantum Economics Intelligence Initiative, spearheaded by Quantum Economist PhDs. Kaiola M Liu integrates insights from seminal thinkers like Einstein, Archimedes, Adam Smith, Nick Land, and Sun Tzu. By applying principles of quantum mechanics, this forward-looking project aims to redefine economic modeling, exploring real-world applications and potential benefits. The initiative encompasses foundational studies, economic model applications, incorporation of quantum computing, and analysis of contemporary economic philosophies. Keywords - Quantum Mechanics, Economics, Technological Advancements, Philosophy

    The motivational brain: neural encoding of reward and effort in goal-directed behavior

    Get PDF
    • 

    corecore