Models for reinforcement learning and design of a soft robot inspired by Drosophila larvae

Abstract

Designs for robots are often inspired by animals, as they are designed mimicking animals’ mechanics, motions, behaviours and learning. The Drosophila, known as the fruit fly, is a well-studied model animal. In this thesis, the Drosophila larva is studied and the results are applied to robots. More specifically: a part of the Drosophila larva’s neural circuit for operant learning is modelled, based on which a synaptic plasticity model and a neural circuit model for operant learning, as well as a dynamic neural network for robot reinforcement learning, are developed; then Drosophila larva’s motor system for locomotion is studied, and based on it a soft robot system is designed. Operant learning is a concept similar to reinforcement learning in computer science, i.e. learning by reward or punishment for behaviour. Experiments have shown that a wide range of animals is capable of operant learning, including animal with only a few neurons, such as Drosophila. The fact implies that operant learning can establish without a large number of neurons. With it as an assumption, the structure and dynamics of synapses are investigated, and a synaptic plasticity model is proposed. The model includes nonlinear dynamics of synapses, especially receptor trafficking which affects synaptic strength. Tests of this model show it can enable operant learning at the neuron level and apply to a broad range of NNs, including feedforward, recurrent and spiking NNs. The mushroom body is a learning centre of the insect brain known and modelled for associative learning, but not yet for operant learning. To investigate whether it participates in operant learning, Drosophila larvae are studied with a transgenic tool by my collaborators. Based on the experiment and the results, a mushroom body model capable of operant learning is modelled. The proposed neural circuit model can reproduce the operant learning of the turning behaviour of Drosophila larvae. Then the synaptic plasticity model is simplified for robot learning. With the simplified model, a recurrent neural network with internal neural dynamics can learn to control a planar bipedal robot in a benchmark reinforcement learning task which is called bipedal walker by OpenAI. Benefiting efficiency in parameter space exploration instead of action space exploration, it is the first known solution to the task with reinforcement learning approaches. Although existing pneumatic soft robots can have multiple muscles embedded in a component, it is far less than the muscles in the Drosophila larva, which are well-organised in a tiny space. A soft robot system is developed based on the muscle pattern of the Drosophila larva, to explore the possibility to embed a high density of muscles in a limited space. Three versions of the body wall with pneumatic muscles mimicking the muscle pattern are designed. A pneumatic control system and embedded control system are also developed for controlling the robot. With a bioinspired body wall will a large number of muscles, the robot performs lifelike motions in experiments

    Similar works