10 research outputs found
Developmental Modular Reinforcement Learning
International audienceIn this article, we propose a modular reinforcement learning (MRL) architecture that coordinates the competition and the cooperation between modules, and inspire, in a developmental approach, the generation of new modules in cases where new goals have been detected. We evaluate the effectiveness of our approach in a multiple-goal torus grid world. Results show that our approach has better performance than previous MRL methods in learning separate strategies for sub-goals, and reusing them for solving task-specific or unseen multi-goal problems, as well as maintaining the independence of the learning in each module
Augmented Modular Reinforcement Learning based on Heterogeneous Knowledge
In order to mitigate some of the inefficiencies of Reinforcement Learning
(RL), modular approaches composing different decision-making policies to derive
agents capable of performing a variety of tasks have been proposed. The modules
at the basis of these architectures are generally reusable, also allowing for
"plug-and-play" integration. However, such solutions still lack the ability to
process and integrate multiple types of information (knowledge), such as rules,
sub-goals, and skills. We propose Augmented Modular Reinforcement Learning
(AMRL) to address these limitations. This new framework uses an arbitrator to
select heterogeneous modules and seamlessly incorporate different types of
knowledge. Additionally, we introduce a variation of the selection mechanism,
namely the Memory-Augmented Arbitrator, which adds the capability of exploiting
temporal information. We evaluate the proposed mechanisms on established as
well as new environments and benchmark them against prominent deep RL
algorithms. Our results demonstrate the performance improvements that can be
achieved by augmenting traditional modular RL with other forms of heterogeneous
knowledge.Comment: 17 pages, 15 figure
Multi-task learning with modular reinforcement learning
International audienceThe ability to learn compositional strategies in multi-task learning and to exert them appropriately is crucial to the development of artificial intelligence. However, there exist several challenges: (i) how to maintain the independence of modules in learning their own sub-tasks; (ii) how to avoid performance degradation in situations where modules' reward scales are incompatible; (iii) how to find the optimal composite policy for the entire set of tasks. In this paper, we introduce a Modular Reinforcement Learning (MRL) framework that coordinates the competition and the cooperation between separate modules. A selective update mechanism enables the learning system to align incomparable reward scales in different modules. Furthermore, the learning system follows a "joint policy" to calculate actions' preferences combined with their responsibility for the current task. We evaluate the effectiveness of our approach on a classic food-gathering and predator-avoidance task. Results show that our approach has better performance than previous MRL methods in learning separate strategies for sub-tasks, is robust to modules with incomparable reward scales, and maintains the independence of the learning in each module
Decision Stacks: Flexible Reinforcement Learning via Modular Generative Models
Reinforcement learning presents an attractive paradigm to reason about
several distinct aspects of sequential decision making, such as specifying
complex goals, planning future observations and actions, and critiquing their
utilities. However, the combined integration of these capabilities poses
competing algorithmic challenges in retaining maximal expressivity while
allowing for flexibility in modeling choices for efficient learning and
inference. We present Decision Stacks, a generative framework that decomposes
goal-conditioned policy agents into 3 generative modules. These modules
simulate the temporal evolution of observations, rewards, and actions via
independent generative models that can be learned in parallel via teacher
forcing. Our framework guarantees both expressivity and flexibility in
designing individual modules to account for key factors such as architectural
bias, optimization objective and dynamics, transferrability across domains, and
inference speed. Our empirical results demonstrate the effectiveness of
Decision Stacks for offline policy optimization for several MDP and POMDP
environments, outperforming existing methods and enabling flexible generative
decision making.Comment: published at NeurIPS 2023, project page:
https://siyan-zhao.github.io/decision-stacks
Concurrent Skill Composition using Ensemble of Primitive Skills
One of the key characteristics of an open-ended cumulative learning agent is that it should use the knowledge gained from prior learning to solve future tasks. That characteristic is especially essential in robotics, as learning every perception-action skill from scratch is not only time consuming but may not always be feasible. In the case of reinforcement learning, this learned knowledge is called a policy. The lifelong learning agent should treat the policies of learned tasks as building blocks to solve those future tasks. One of the categorizations of tasks is based on its composition, ranging from primitive tasks to compound tasks that are either a sequential or concurrent combination of primitive tasks. Thus, the agent needs to be able to combine the policies of the primitive tasks to solve compound tasks, which are then added to its knowledge base. Inspired by modular neural networks, we propose an approach to compose policies for compound tasks that are concurrent combinations of disjoint tasks. Furthermore, we hypothesize that learning in a specialized environment leads to more efficient learning; hence, we create scaffolded environments for the robot to learn primitive skills for our mobile robot-based experiments. We then show how the agent can combine those primitive skills to learn solutions for compound tasks. That reduces the overall training time of multiple skills and creates a versatile agent that can mix and match the skills.</p
Digital Twins: Review and Challenges
[EN] With the arises of Industry 4.0, numerous concepts have emerged; one of the main concepts is the digital twin (DT). DT is being widely used nowadays, however, as there are several uses in the existing literature; the understanding of the concept and its functioning can be diffuse. The main goal of this paper is to provide a review of the existing literature to clarify the concept, operation, and main characteristics of DT, to introduce the most current operating, communication, and usage trends related to this technology, and to present the performance of the synergy between DT and multi-agent system (MAS) technologies through a computer science approach.This work was partly supported by the Spanish Government (RTI2018-095390-B-C31)Juárez-Juárez, MG.; Botti, V.; Giret Boggino, AS. (2021). Digital Twins: Review and Challenges. Journal of Computing and Information Science in Engineering. 21(3):1-23. https://doi.org/10.1115/1.405024412321
Composable Modular Reinforcement Learning
Modular reinforcement learning (MRL) decomposes a monolithic multiple-goal problem into modules that solve a portion of the original problem. The modules’ action preferences are arbitrated to determine the action taken by the agent. Truly modular reinforcement learning would support not only decomposition into modules, but composability of separately written modules in new modular reinforcement learning agents. However, the performance of MRL agents that arbitrate module preferences using additive reward schemes degrades when the modules have incomparable reward scales. This performance degradation means that separately written modules cannot be composed in new modular reinforcement learning agents as-is – they may need to be modified to align their reward scales. We solve this problem with a Q-learningbased command arbitration algorithm and demonstrate that it does not exhibit the same performance degradation as existing approaches to MRL, thereby supporting composability