472 research outputs found

    A Bio-inspired Reinforcement Learning Rule to Optimise Dynamical Neural Networks for Robot Control

    Get PDF

    Models for reinforcement learning and design of a soft robot inspired by Drosophila larvae

    Get PDF
    Designs for robots are often inspired by animals, as they are designed mimicking animals’ mechanics, motions, behaviours and learning. The Drosophila, known as the fruit fly, is a well-studied model animal. In this thesis, the Drosophila larva is studied and the results are applied to robots. More specifically: a part of the Drosophila larva’s neural circuit for operant learning is modelled, based on which a synaptic plasticity model and a neural circuit model for operant learning, as well as a dynamic neural network for robot reinforcement learning, are developed; then Drosophila larva’s motor system for locomotion is studied, and based on it a soft robot system is designed. Operant learning is a concept similar to reinforcement learning in computer science, i.e. learning by reward or punishment for behaviour. Experiments have shown that a wide range of animals is capable of operant learning, including animal with only a few neurons, such as Drosophila. The fact implies that operant learning can establish without a large number of neurons. With it as an assumption, the structure and dynamics of synapses are investigated, and a synaptic plasticity model is proposed. The model includes nonlinear dynamics of synapses, especially receptor trafficking which affects synaptic strength. Tests of this model show it can enable operant learning at the neuron level and apply to a broad range of NNs, including feedforward, recurrent and spiking NNs. The mushroom body is a learning centre of the insect brain known and modelled for associative learning, but not yet for operant learning. To investigate whether it participates in operant learning, Drosophila larvae are studied with a transgenic tool by my collaborators. Based on the experiment and the results, a mushroom body model capable of operant learning is modelled. The proposed neural circuit model can reproduce the operant learning of the turning behaviour of Drosophila larvae. Then the synaptic plasticity model is simplified for robot learning. With the simplified model, a recurrent neural network with internal neural dynamics can learn to control a planar bipedal robot in a benchmark reinforcement learning task which is called bipedal walker by OpenAI. Benefiting efficiency in parameter space exploration instead of action space exploration, it is the first known solution to the task with reinforcement learning approaches. Although existing pneumatic soft robots can have multiple muscles embedded in a component, it is far less than the muscles in the Drosophila larva, which are well-organised in a tiny space. A soft robot system is developed based on the muscle pattern of the Drosophila larva, to explore the possibility to embed a high density of muscles in a limited space. Three versions of the body wall with pneumatic muscles mimicking the muscle pattern are designed. A pneumatic control system and embedded control system are also developed for controlling the robot. With a bioinspired body wall will a large number of muscles, the robot performs lifelike motions in experiments

    Movement primitives as a robotic tool to interpret trajectories through learning-by-doing

    Get PDF
    Articulated movements are fundamental in many human and robotic tasks. While humans can learn and generalise arbitrarily long sequences of movements, and particularly can optimise them to fit the constraints and features of their body, robots are often programmed to execute point-to-point precise but fixed patterns. This study proposes a new approach to interpreting and reproducing articulated and complex trajectories as a set of known robot-based primitives. Instead of achieving accurate reproductions, the proposed approach aims at interpreting data in an agent-centred fashion, according to an agent's primitive movements. The method improves the accuracy of a reproduction with an incremental process that seeks first a rough approximation by capturing the most essential features of a demonstrated trajectory. Observing the discrepancy between the demonstrated and reproduced trajectories, the process then proceeds with incremental decompositions and new searches in sub-optimal parts of the trajectory. The aim is to achieve an agent-centred interpretation and progressive learning that fits in the first place the robots' capability, as opposed to a data-centred decomposition analysis. Tests on both geometric and human generated trajectories reveal that the use of own primitives results in remarkable robustness and generalisation properties of the method. In particular, because trajectories are understood and abstracted by means of agent-optimised primitives, the method has two main features: 1) Reproduced trajectories are general and represent an abstraction of the data. 2) The algorithm is capable of reconstructing highly noisy or corrupted data without pre-processing thanks to an implicit and emergent noise suppression and feature detection. This study suggests a novel bio-inspired approach to interpreting, learning and reproducing articulated movements and trajectories. Possible applications include drawing, writing, movement generation, object manipulation, and other tasks where the performance requires human-like interpretation and generalisation capabilities

    Recurrent neural networks and adaptive motor control

    Get PDF
    This thesis is concerned with the use of neural networks for motor control tasks. The main goal of the thesis is to investigate ways in which the biological notions of motor programs and Central Pattern Generators (CPGs) may be implemented in a neural network framework. Biological CPGs can be seen as components within a larger control scheme, which is basically modular in design. In this thesis, these ideas are investigated through the use of modular recurrent networks, which are used in a variety of control tasks. The first experimental chapter deals with learning in recurrent networks, and it is shown that CPGs may be easily implemented using the machinery of backpropagation. The use of these CPGs can aid the learning of pattern generation tasks; they can also mean that the other components in the system can be reduced in complexity, say, to a purely feedforward network. It is also shown that incremental learning, or 'shaping' is an effective method for building CPGs. Genetic algorithms are also used to build CPGs; although computational effort prevents this from being a practical method, it does show that GAs are capable of optimising systems that operate in the context of a larger scheme. One interesting result from the GA is that optimal CPGs tend to have unstable dynamics, which may have implications for building modular neural controllers. The next chapter applies these ideas to some simple control tasks involving a highly redundant simulated robot arm. It was shown that it is relatively straightforward to build CPGs that represent elements of pattern generation, constraint satisfaction. and local feedback. This is indirect control, in which errors are backpropagated through a plant model, as well as the ePG itself, to give errors for the controller. Finally, the third experimental chapter takes an alternative approach, and uses direct control methods, such as reinforcement learning. In reinforcement learning, controller outputs have unmodelled effects; this allows us to build complex control systems, where outputs modulate the couplings between sets of dynamic systems. This was shown for a simple case, involving a system of coupled oscillators. A second set of experiments investigates the use of simplified models of behaviour; this is a reduced form of supervised learning, and the use of such models in control is discussed

    Integrating reinforcement learning, equilibrium points and minimum variance to understand the development of reaching: a computational model

    Get PDF
    Despite the huge literature on reaching behaviour we still lack a clear idea about the motor control processes underlying its development in infants. This article contributes to overcome this gap by proposing a computational model based on three key hypotheses: (a) trial-anderror learning processes drive the progressive development of reaching; (b) the control of the movements based on equilibrium points allows the model to quickly find the initial approximate solution to the problem of gaining contact with the target objects; (c) the request of precision of the end-movement in the presence of muscular noise drives the progressive refinement of the reaching behaviour. The tests of the model, based on a two degrees of freedom simulated dynamical arm, show that it is capable of reproducing a large number of empirical findings, most deriving from longitudinal studies with children: the developmental trajectory of several dynamical and kinematic variables of reaching movements, the time evolution of submovements composing reaching, the progressive development of a bell-shaped speed profile, and the evolution of the management of redundant degrees of freedom. The model also produces testable predictions on several of these phenomena. Most of these empirical data have never been investigated by previous computational models and, more importantly, have never been accounted for by a unique model. In this respect, the analysis of the model functioning reveals that all these results are ultimately explained, sometimes in unexpected ways, by the same developmental trajectory emerging from the interplay of the three mentioned hypotheses: the model first quickly learns to perform coarse movements that assure a contact of the hand with the target (an achievement with great adaptive value), and then slowly refines the detailed control of the dynamical aspects of movement to increase accuracy

    Navigational Path Analysis of Mobile Robot in Various Environments

    Get PDF
    This dissertation describes work in the area of an autonomous mobile robot. The objective is navigation of mobile robot in a real world dynamic environment avoiding structured and unstructured obstacles either they are static or dynamic. The shapes and position of obstacles are not known to robot prior to navigation. The mobile robot has sensory recognition of specific objects in the environments. This sensory-information provides local information of robots immediate surroundings to its controllers. The information is dealt intelligently by the robot to reach the global objective (the target). Navigational paths as well as time taken during navigation by the mobile robot can be expressed as an optimisation problem and thus can be analyzed and solved using AI techniques. The optimisation of path as well as time taken is based on the kinematic stability and the intelligence of the robot controller. A successful way of structuring the navigation task deals with the issues of individual behaviour design and action coordination of the behaviours. The navigation objective is addressed using fuzzy logic, neural network, adaptive neuro-fuzzy inference system and different other AI technique.The research also addresses distributed autonomous systems using multiple robot

    Active Predicting Coding: Brain-Inspired Reinforcement Learning for Sparse Reward Robotic Control Problems

    Full text link
    In this article, we propose a backpropagation-free approach to robotic control through the neuro-cognitive computational framework of neural generative coding (NGC), designing an agent built completely from powerful predictive coding/processing circuits that facilitate dynamic, online learning from sparse rewards, embodying the principles of planning-as-inference. Concretely, we craft an adaptive agent system, which we call active predictive coding (ActPC), that balances an internally-generated epistemic signal (meant to encourage intelligent exploration) with an internally-generated instrumental signal (meant to encourage goal-seeking behavior) to ultimately learn how to control various simulated robotic systems as well as a complex robotic arm using a realistic robotics simulator, i.e., the Surreal Robotics Suite, for the block lifting task and can pick-and-place problems. Notably, our experimental results demonstrate that our proposed ActPC agent performs well in the face of sparse (extrinsic) reward signals and is competitive with or outperforms several powerful backprop-based RL approaches.Comment: Contains appendix with pseudocode and additional detail

    PCT and beyond: toward a computational framework for ‘intelligent’ communicative systems

    No full text
    Recent years have witnessed increasing interest in ‘intelligent’ autonomous machines such as robots. However, there is a long way to go before autonomous systems reach the level of capabilities required for even the simplest of tasks involving human-robot interaction - especially if it involves communicative behavior such as speech and language. The field of Artificial Intelligence (AI) has made great strides in these areas, and has graduated from high-level rule-based paradigms to embodied architectures whose operations are grounded in real physical environments. What is still missing, however, is an overarching theory of intelligent communicative behavior that informs system-level design decisions. This chapter introduces a framework that extends the principles of Perceptual Control Theory (PCT) toward a remarkably symmetric architecture for a needs-driven communicative agent. It is concluded that, if behavior is the control of perception (the central tenet of PCT), then perception (for communicative agents) is the simulation of behavior

    World model learning and inference

    Get PDF
    Understanding information processing in the brain-and creating general-purpose artificial intelligence-are long-standing aspirations of scientists and engineers worldwide. The distinctive features of human intelligence are high-level cognition and control in various interactions with the world including the self, which are not defined in advance and are vary over time. The challenge of building human-like intelligent machines, as well as progress in brain science and behavioural analyses, robotics, and their associated theoretical formalisations, speaks to the importance of the world-model learning and inference. In this article, after briefly surveying the history and challenges of internal model learning and probabilistic learning, we introduce the free energy principle, which provides a useful framework within which to consider neuronal computation and probabilistic world models. Next, we showcase examples of human behaviour and cognition explained under that principle. We then describe symbol emergence in the context of probabilistic modelling, as a topic at the frontiers of cognitive robotics. Lastly, we review recent progress in creating human-like intelligence by using novel probabilistic programming languages. The striking consensus that emerges from these studies is that probabilistic descriptions of learning and inference are powerful and effective ways to create human-like artificial intelligent machines and to understand intelligence in the context of how humans interact with their world

    Neuromodulatory Supervised Learning

    Get PDF
    corecore