Search CORE

27 research outputs found

Sigmoid-weighted linear units for neural network function approximation in reinforcement learning

Author: Eiji Uchibe
Kenji Doya
Stefan Elfwing
Publication venue: 'Elsevier BV'
Publication date: 01/11/2017
Field of study

In recent years, neural networks have enjoyed a renaissance as function approximators in reinforcement learning. Two decades after Tesauro\u27s TD-Gammon achieved near top-level human performance in backgammon, the deep reinforcement learning algorithm DQN achieved human-level performance in many Atari 2600 games. The purpose of this study is twofold. First, we propose two activation functions for neural network function approximation in reinforcement learning: the sigmoid-weighted linear unit (SiLU) and its derivative function (dSiLU). The activation of the SiLU is computed by the sigmoid function multiplied by its input. Second, we suggest that the more traditional approach of using on-policy learning with eligibility traces, instead of experience replay, and softmax action selection can be competitive with DQN, without the need for a separate target network. We validate our proposed approach by, first, achieving new state-of-the-art results in both stochastic SZ-Tetris and Tetris with a small 10 x 10 board, using TD(lambda) learning and shallow dSiLU network agents, and, then, by outperforming DQN in the Atari 2600 domain by using a deep Sarsa(lambda) agent with SiLU and dSiLU hidden units

arXiv.org e-Print Archive

OIST Institutional Repository

Embodied Evolution of Learning Ability

Author: Elfwing Stefan
Publication venue: Stockholm : KTH
Publication date: 01/01/2007
Field of study

Embodied evolution is a methodology for evolutionary robotics that mimics the distributed, asynchronous, and autonomous properties of biological evolution. The evaluation, selection, and reproduction are carried out by cooperation and competition of the robots, without any need for human intervention. An embodied evolution framework is therefore well suited to study the adaptive learning mechanisms for artificial agents that share the same fundamental constraints as biological agents: self-preservation and self-reproduction. The main goal of the research in this thesis has been to develop a framework for performing embodied evolution with a limited number of robots, by utilizing time-sharing of subpopulations of virtual agents inside each robot. The framework integrates reproduction as a directed autonomous behavior, and allows for learning of basic behaviors for survival by reinforcement learning. The purpose of the evolution is to evolve the learning ability of the agents, by optimizing meta-properties in reinforcement learning, such as the selection of basic behaviors, meta-parameters that modulate the efficiency of the learning, and additional and richer reward signals that guides the learning in the form of shaping rewards. The realization of the embodied evolution framework has been a cumulative research process in three steps: 1) investigation of the learning of a cooperative mating behavior for directed autonomous reproduction; 2) development of an embodied evolution framework, in which the selection of pre-learned basic behaviors and the optimization of battery recharging are evolved; and 3) development of an embodied evolution framework that includes meta-learning of basic reinforcement learning behaviors for survival, and in which the individuals are evaluated by an implicit and biologically inspired fitness function that promotes reproductive ability. The proposed embodied evolution methods have been validated in a simulation environment of the Cyber Rodent robot, a robotic platform developed for embodied evolution purposes. The evolutionarily obtained solutions have also been transferred to the real robotic platform. The evolutionary approach to meta-learning has also been applied for automatic design of task hierarchies in hierarchical reinforcement learning, and for co-evolving meta-parameters and potential-based shaping rewards to accelerate reinforcement learning, both in regards to finding initial solutions and in regards to convergence to robust policies.QC 2010070

Publikationer från KTH

Embodied Evolution of Learning Ability

Author: Elfwing Stefan
Publication venue: Stockholm : KTH
Publication date: 01/01/2007
Field of study

Publikationer från KTH

Digitala Vetenskapliga Arkivet - Academic Archive On-line

Emergence of polymorphic mating strategies in robot colonies.

Author: Kenji Doya
Stefan Elfwing
Publication venue: Public Library of Science (PLoS)
Publication date: 09/04/2014
Field of study

Polymorphism has fascinated evolutionary biologists since the time of Darwin. Biologists have observed discrete alternative mating strategies in many different species. In this study, we demonstrate that polymorphic mating strategies can emerge in a colony of hermaphrodite robots. We used a survival and reproduction task where the robots maintained their energy levels by capturing energy sources and physically exchanged genotypes for the reproduction of offspring. The reproductive success was dependent on the individuals' energy levels, which created a natural trade-off between the time invested in maintaining a high energy level and the time invested in attracting mating partners. We performed experiments in environments with different density of energy sources and observed a variety in the mating behavior when a robot could see both an energy source and a potential mating partner. The individuals could be classified into two phenotypes: 1) forager, who always chooses to capture energy sources, and 2) tracker, who keeps track of potential mating partners if its energy level is above a threshold. In four out of the seven highest fitness populations in different environments, we found subpopulations with distinct differences in genotype and in behavioral phenotype. We analyzed the fitnesses of the foragers and the trackers by sampling them from each subpopulation and mixing with different ratios in a population. The fitness curves for the two subpopulations crossed at about 25% of foragers in the population, showing the evolutionary stability of the polymorphism. In one of those polymorphic populations, the trackers were further split into two subpopulations: (strong trackers) and (weak trackers). Our analyses show that the population consisting of three phenotypes also constituted several stable polymorphic evolutionarily stable states. To our knowledge, our study is the first to demonstrate the emergence of polymorphic evolutionarily stable strategies within a robot evolution framework

Public Library of Science (PLOS)

Directory of Open Access Journals

PubMed Central

Average energy level at the mating events as functions of the tracker proportion in the population.

Author: Kenji Doya (82703)
Stefan Elfwing (549280)
Publication venue
Publication date
Field of study

<p>The dotted lines show the constant approximations as the average values over all phenotype proportions.</p

FigShare

Sigmoid-weighted linear units for neural network function approximation in reinforcement learning

Author: Eiji Uchibe
Kenji Doya
Stefan Elfwing
Publication venue: 'Elsevier BV'
Publication date: 14/08/2018
Field of study

Institutional Repositories DataBase (IRDB)

Two physical robots with six energy sources.

Author: Kenji Doya (82703)
Stefan Elfwing (549280)
Publication venue
Publication date
Field of study

<p>The Cyber Rodent robots used in the experiments were equipped infrared communication for the exchange of genotypes and cameras for visual detection of energy sources (blue), tail-lamps of other robots (green), and faces of other robots (red).</p

FigShare