Multi-objective evolution for Generalizable Policy Gradient Algorithms

Co-Reyes, John D.; Faust, Aleksandra; Garau-Luis, Juan Jose; Miao, Yingjie; Parisi, Aaron; Real, Esteban; Tan, Jie

Multi-objective evolution for Generalizable Policy Gradient Algorithms

Authors: John D. Co-Reyes
Aleksandra Faust
Juan Jose Garau-Luis
Yingjie Miao
Aaron Parisi
Esteban Real
Jie Tan
Publication date: 8 April 2022
Publisher

Abstract

Performance, generalizability, and stability are three Reinforcement Learning (RL) challenges relevant to many practical applications in which they present themselves in combination. Still, state-of-the-art RL algorithms fall short when addressing multiple RL objectives simultaneously and current human-driven design practices might not be well-suited for multi-objective RL. In this paper we present MetaPG, an evolutionary method that discovers new RL algorithms represented as graphs, following a multi-objective search criteria in which different RL objectives are encoded in separate fitness scores. Our findings show that, when using a graph-based implementation of Soft Actor-Critic (SAC) to initialize the population, our method is able to find new algorithms that improve upon SAC's performance and generalizability by 3% and 17%, respectively, and reduce instability up to 65%. In addition, we analyze the graph structure of the best algorithms in the population and offer an interpretation of specific elements that help trading performance for generalizability and vice versa. We validate our findings in three different continuous control tasks: RWRL Cartpole, RWRL Walker, and Gym Pendulum.Comment: 23 pages, 12 figures, 10 table

Similar works

Full text

Available Versions

arXiv.org e-Print Archive

oai:arXiv.org:2204.04292

Last time updated on 28/04/2022