145 research outputs found

    A Domain Independent Explanation-Based Generalizer

    Get PDF
    Coordinated Science Laboratory was formerly known as Control Systems LaboratoryONR / N00014-86-K-0309National Science Foundation / IST-83-1788

    Multi-objective evolution for Generalizable Policy Gradient Algorithms

    Full text link
    Performance, generalizability, and stability are three Reinforcement Learning (RL) challenges relevant to many practical applications in which they present themselves in combination. Still, state-of-the-art RL algorithms fall short when addressing multiple RL objectives simultaneously and current human-driven design practices might not be well-suited for multi-objective RL. In this paper we present MetaPG, an evolutionary method that discovers new RL algorithms represented as graphs, following a multi-objective search criteria in which different RL objectives are encoded in separate fitness scores. Our findings show that, when using a graph-based implementation of Soft Actor-Critic (SAC) to initialize the population, our method is able to find new algorithms that improve upon SAC's performance and generalizability by 3% and 17%, respectively, and reduce instability up to 65%. In addition, we analyze the graph structure of the best algorithms in the population and offer an interpretation of specific elements that help trading performance for generalizability and vice versa. We validate our findings in three different continuous control tasks: RWRL Cartpole, RWRL Walker, and Gym Pendulum.Comment: 23 pages, 12 figures, 10 table

    Explanation-based generalization of partially ordered plans

    Get PDF
    Most previous work in analytic generalization of plans dealt with totally ordered plans. These methods cannot be directly applied to generalizing partially ordered plans, since they do not capture all interactions among plan operators for all total orders of such plans. We introduce a new method for generalizing partially ordered plans. This method is based on providing explanation-based generalization (EBG) with explanations which systematically capture the interactions among plan operators for all the total orders of a partially-ordered plan. The explanations are based on the Modal Truth Criterion which states the necessary and sufficient conditions for ensuring the truth of a proposition at any point in a plan, for a class of partially ordered plans. The generalizations obtained by this method guarantee successful and interaction-free execution of any total order of the generalized plan. In addition, the systematic derivation of the generalization algorithms from the Modal Truth Criterion obviates the need for carrying out a separate formal proof of correctness of the EBG algorithms
    corecore