10 research outputs found

    Developmental Modular Reinforcement Learning

    Get PDF
    International audienceIn this article, we propose a modular reinforcement learning (MRL) architecture that coordinates the competition and the cooperation between modules, and inspire, in a developmental approach, the generation of new modules in cases where new goals have been detected. We evaluate the effectiveness of our approach in a multiple-goal torus grid world. Results show that our approach has better performance than previous MRL methods in learning separate strategies for sub-goals, and reusing them for solving task-specific or unseen multi-goal problems, as well as maintaining the independence of the learning in each module

    Augmented Modular Reinforcement Learning based on Heterogeneous Knowledge

    Full text link
    In order to mitigate some of the inefficiencies of Reinforcement Learning (RL), modular approaches composing different decision-making policies to derive agents capable of performing a variety of tasks have been proposed. The modules at the basis of these architectures are generally reusable, also allowing for "plug-and-play" integration. However, such solutions still lack the ability to process and integrate multiple types of information (knowledge), such as rules, sub-goals, and skills. We propose Augmented Modular Reinforcement Learning (AMRL) to address these limitations. This new framework uses an arbitrator to select heterogeneous modules and seamlessly incorporate different types of knowledge. Additionally, we introduce a variation of the selection mechanism, namely the Memory-Augmented Arbitrator, which adds the capability of exploiting temporal information. We evaluate the proposed mechanisms on established as well as new environments and benchmark them against prominent deep RL algorithms. Our results demonstrate the performance improvements that can be achieved by augmenting traditional modular RL with other forms of heterogeneous knowledge.Comment: 17 pages, 15 figure

    Multi-task learning with modular reinforcement learning

    Get PDF
    International audienceThe ability to learn compositional strategies in multi-task learning and to exert them appropriately is crucial to the development of artificial intelligence. However, there exist several challenges: (i) how to maintain the independence of modules in learning their own sub-tasks; (ii) how to avoid performance degradation in situations where modules' reward scales are incompatible; (iii) how to find the optimal composite policy for the entire set of tasks. In this paper, we introduce a Modular Reinforcement Learning (MRL) framework that coordinates the competition and the cooperation between separate modules. A selective update mechanism enables the learning system to align incomparable reward scales in different modules. Furthermore, the learning system follows a "joint policy" to calculate actions' preferences combined with their responsibility for the current task. We evaluate the effectiveness of our approach on a classic food-gathering and predator-avoidance task. Results show that our approach has better performance than previous MRL methods in learning separate strategies for sub-tasks, is robust to modules with incomparable reward scales, and maintains the independence of the learning in each module

    Decision Stacks: Flexible Reinforcement Learning via Modular Generative Models

    Full text link
    Reinforcement learning presents an attractive paradigm to reason about several distinct aspects of sequential decision making, such as specifying complex goals, planning future observations and actions, and critiquing their utilities. However, the combined integration of these capabilities poses competing algorithmic challenges in retaining maximal expressivity while allowing for flexibility in modeling choices for efficient learning and inference. We present Decision Stacks, a generative framework that decomposes goal-conditioned policy agents into 3 generative modules. These modules simulate the temporal evolution of observations, rewards, and actions via independent generative models that can be learned in parallel via teacher forcing. Our framework guarantees both expressivity and flexibility in designing individual modules to account for key factors such as architectural bias, optimization objective and dynamics, transferrability across domains, and inference speed. Our empirical results demonstrate the effectiveness of Decision Stacks for offline policy optimization for several MDP and POMDP environments, outperforming existing methods and enabling flexible generative decision making.Comment: published at NeurIPS 2023, project page: https://siyan-zhao.github.io/decision-stacks

    Concurrent Skill Composition using Ensemble of Primitive Skills

    Get PDF
    One of the key characteristics of an open-ended cumulative learning agent is that it should use the knowledge gained from prior learning to solve future tasks. That characteristic is especially essential in robotics, as learning every perception-action skill from scratch is not only time consuming but may not always be feasible. In the case of reinforcement learning, this learned knowledge is called a policy. The lifelong learning agent should treat the policies of learned tasks as building blocks to solve those future tasks. One of the categorizations of tasks is based on its composition, ranging from primitive tasks to compound tasks that are either a sequential or concurrent combination of primitive tasks. Thus, the agent needs to be able to combine the policies of the primitive tasks to solve compound tasks, which are then added to its knowledge base. Inspired by modular neural networks, we propose an approach to compose policies for compound tasks that are concurrent combinations of disjoint tasks. Furthermore, we hypothesize that learning in a specialized environment leads to more efficient learning; hence, we create scaffolded environments for the robot to learn primitive skills for our mobile robot-based experiments. We then show how the agent can combine those primitive skills to learn solutions for compound tasks. That reduces the overall training time of multiple skills and creates a versatile agent that can mix and match the skills.</p

    Digital Twins: Review and Challenges

    Full text link
    [EN] With the arises of Industry 4.0, numerous concepts have emerged; one of the main concepts is the digital twin (DT). DT is being widely used nowadays, however, as there are several uses in the existing literature; the understanding of the concept and its functioning can be diffuse. The main goal of this paper is to provide a review of the existing literature to clarify the concept, operation, and main characteristics of DT, to introduce the most current operating, communication, and usage trends related to this technology, and to present the performance of the synergy between DT and multi-agent system (MAS) technologies through a computer science approach.This work was partly supported by the Spanish Government (RTI2018-095390-B-C31)Juárez-Juárez, MG.; Botti, V.; Giret Boggino, AS. (2021). Digital Twins: Review and Challenges. Journal of Computing and Information Science in Engineering. 21(3):1-23. https://doi.org/10.1115/1.405024412321

    Composable Modular Reinforcement Learning

    No full text
    Modular reinforcement learning (MRL) decomposes a monolithic multiple-goal problem into modules that solve a portion of the original problem. The modules’ action preferences are arbitrated to determine the action taken by the agent. Truly modular reinforcement learning would support not only decomposition into modules, but composability of separately written modules in new modular reinforcement learning agents. However, the performance of MRL agents that arbitrate module preferences using additive reward schemes degrades when the modules have incomparable reward scales. This performance degradation means that separately written modules cannot be composed in new modular reinforcement learning agents as-is – they may need to be modified to align their reward scales. We solve this problem with a Q-learningbased command arbitration algorithm and demonstrate that it does not exhibit the same performance degradation as existing approaches to MRL, thereby supporting composability

    Composable Modular Reinforcement Learning

    No full text
    corecore