27 research outputs found

    Sigmoid-weighted linear units for neural network function approximation in reinforcement learning

    Get PDF
    In recent years, neural networks have enjoyed a renaissance as function approximators in reinforcement learning. Two decades after Tesauro\u27s TD-Gammon achieved near top-level human performance in backgammon, the deep reinforcement learning algorithm DQN achieved human-level performance in many Atari 2600 games. The purpose of this study is twofold. First, we propose two activation functions for neural network function approximation in reinforcement learning: the sigmoid-weighted linear unit (SiLU) and its derivative function (dSiLU). The activation of the SiLU is computed by the sigmoid function multiplied by its input. Second, we suggest that the more traditional approach of using on-policy learning with eligibility traces, instead of experience replay, and softmax action selection can be competitive with DQN, without the need for a separate target network. We validate our proposed approach by, first, achieving new state-of-the-art results in both stochastic SZ-Tetris and Tetris with a small 10 x 10 board, using TD(lambda) learning and shallow dSiLU network agents, and, then, by outperforming DQN in the Atari 2600 domain by using a deep Sarsa(lambda) agent with SiLU and dSiLU hidden units

    Embodied Evolution of Learning Ability

    No full text
    Embodied evolution is a methodology for evolutionary robotics that mimics the distributed, asynchronous, and autonomous properties of biological evolution. The evaluation, selection, and reproduction are carried out by cooperation and competition of the robots, without any need for human intervention. An embodied evolution framework is therefore well suited to study the adaptive learning mechanisms for artificial agents that share the same fundamental constraints as biological agents: self-preservation and self-reproduction. The main goal of the research in this thesis has been to develop a framework for performing embodied evolution with a limited number of robots, by utilizing time-sharing of subpopulations of virtual agents inside each robot. The framework integrates reproduction as a directed autonomous behavior, and allows for learning of basic behaviors for survival by reinforcement learning. The purpose of the evolution is to evolve the learning ability of the agents, by optimizing meta-properties in reinforcement learning, such as the selection of basic behaviors, meta-parameters that modulate the efficiency of the learning, and additional and richer reward signals that guides the learning in the form of shaping rewards. The realization of the embodied evolution framework has been a cumulative research process in three steps: 1) investigation of the learning of a cooperative mating behavior for directed autonomous reproduction; 2) development of an embodied evolution framework, in which the selection of pre-learned basic behaviors and the optimization of battery recharging are evolved; and 3) development of an embodied evolution framework that includes meta-learning of basic reinforcement learning behaviors for survival, and in which the individuals are evaluated by an implicit and biologically inspired fitness function that promotes reproductive ability. The proposed embodied evolution methods have been validated in a simulation environment of the Cyber Rodent robot, a robotic platform developed for embodied evolution purposes. The evolutionarily obtained solutions have also been transferred to the real robotic platform. The evolutionary approach to meta-learning has also been applied for automatic design of task hierarchies in hierarchical reinforcement learning, and for co-evolving meta-parameters and potential-based shaping rewards to accelerate reinforcement learning, both in regards to finding initial solutions and in regards to convergence to robust policies.QC 2010070

    Embodied Evolution of Learning Ability

    No full text
    Embodied evolution is a methodology for evolutionary robotics that mimics the distributed, asynchronous, and autonomous properties of biological evolution. The evaluation, selection, and reproduction are carried out by cooperation and competition of the robots, without any need for human intervention. An embodied evolution framework is therefore well suited to study the adaptive learning mechanisms for artificial agents that share the same fundamental constraints as biological agents: self-preservation and self-reproduction. The main goal of the research in this thesis has been to develop a framework for performing embodied evolution with a limited number of robots, by utilizing time-sharing of subpopulations of virtual agents inside each robot. The framework integrates reproduction as a directed autonomous behavior, and allows for learning of basic behaviors for survival by reinforcement learning. The purpose of the evolution is to evolve the learning ability of the agents, by optimizing meta-properties in reinforcement learning, such as the selection of basic behaviors, meta-parameters that modulate the efficiency of the learning, and additional and richer reward signals that guides the learning in the form of shaping rewards. The realization of the embodied evolution framework has been a cumulative research process in three steps: 1) investigation of the learning of a cooperative mating behavior for directed autonomous reproduction; 2) development of an embodied evolution framework, in which the selection of pre-learned basic behaviors and the optimization of battery recharging are evolved; and 3) development of an embodied evolution framework that includes meta-learning of basic reinforcement learning behaviors for survival, and in which the individuals are evaluated by an implicit and biologically inspired fitness function that promotes reproductive ability. The proposed embodied evolution methods have been validated in a simulation environment of the Cyber Rodent robot, a robotic platform developed for embodied evolution purposes. The evolutionarily obtained solutions have also been transferred to the real robotic platform. The evolutionary approach to meta-learning has also been applied for automatic design of task hierarchies in hierarchical reinforcement learning, and for co-evolving meta-parameters and potential-based shaping rewards to accelerate reinforcement learning, both in regards to finding initial solutions and in regards to convergence to robust policies.QC 2010070

    Emergence of polymorphic mating strategies in robot colonies.

    No full text
    Polymorphism has fascinated evolutionary biologists since the time of Darwin. Biologists have observed discrete alternative mating strategies in many different species. In this study, we demonstrate that polymorphic mating strategies can emerge in a colony of hermaphrodite robots. We used a survival and reproduction task where the robots maintained their energy levels by capturing energy sources and physically exchanged genotypes for the reproduction of offspring. The reproductive success was dependent on the individuals' energy levels, which created a natural trade-off between the time invested in maintaining a high energy level and the time invested in attracting mating partners. We performed experiments in environments with different density of energy sources and observed a variety in the mating behavior when a robot could see both an energy source and a potential mating partner. The individuals could be classified into two phenotypes: 1) forager, who always chooses to capture energy sources, and 2) tracker, who keeps track of potential mating partners if its energy level is above a threshold. In four out of the seven highest fitness populations in different environments, we found subpopulations with distinct differences in genotype and in behavioral phenotype. We analyzed the fitnesses of the foragers and the trackers by sampling them from each subpopulation and mixing with different ratios in a population. The fitness curves for the two subpopulations crossed at about 25% of foragers in the population, showing the evolutionary stability of the polymorphism. In one of those polymorphic populations, the trackers were further split into two subpopulations: (strong trackers) and (weak trackers). Our analyses show that the population consisting of three phenotypes also constituted several stable polymorphic evolutionarily stable states. To our knowledge, our study is the first to demonstrate the emergence of polymorphic evolutionarily stable strategies within a robot evolution framework

    Average energy level at the mating events as functions of the tracker proportion in the population.

    No full text
    <p>The dotted lines show the constant approximations as the average values over all phenotype proportions.</p

    Two physical robots with six energy sources.

    No full text
    <p>The Cyber Rodent robots used in the experiments were equipped infrared communication for the exchange of genotypes and cameras for visual detection of energy sources (blue), tail-lamps of other robots (green), and faces of other robots (red).</p
    corecore