743 research outputs found

    Less Is More: Robust Robot Learning via Partially Observable Multi-Agent Reinforcement Learning

    Full text link
    In many multi-agent and high-dimensional robotic tasks, the controller can be designed in either a centralized or decentralized way. Correspondingly, it is possible to use either single-agent reinforcement learning (SARL) or multi-agent reinforcement learning (MARL) methods to learn such controllers. However, the relationship between these two paradigms remains under-studied in the literature. This work explores research questions in terms of robustness and performance of SARL and MARL approaches to the same task, in order to gain insight into the most suitable methods. We start by analytically showing the equivalence between these two paradigms under the full-state observation assumption. Then, we identify a broad subclass of \textit{Dec-POMDP} tasks where the agents are weakly or partially interacting. In these tasks, we show that partial observations of each agent are sufficient for near-optimal decision-making. Furthermore, we propose to exploit such partially observable MARL to improve the robustness of robots when joint or agent failures occur. Our experiments on both simulated multi-agent tasks and a real robot task with a mobile manipulator validate the presented insights and the effectiveness of the proposed robust robot learning method via partially observable MARL.Comment: 8 pages, 8 figure

    Cooperative Adaptive Control for Cloud-Based Robotics

    Full text link
    This paper studies collaboration through the cloud in the context of cooperative adaptive control for robot manipulators. We first consider the case of multiple robots manipulating a common object through synchronous centralized update laws to identify unknown inertial parameters. Through this development, we introduce a notion of Collective Sufficient Richness, wherein parameter convergence can be enabled through teamwork in the group. The introduction of this property and the analysis of stable adaptive controllers that benefit from it constitute the main new contributions of this work. Building on this original example, we then consider decentralized update laws, time-varying network topologies, and the influence of communication delays on this process. Perhaps surprisingly, these nonidealized networked conditions inherit the same benefits of convergence being determined through collective effects for the group. Simple simulations of a planar manipulator identifying an unknown load are provided to illustrate the central idea and benefits of Collective Sufficient Richness.Comment: ICRA 201

    Exploring Multi-Agent Reinforcement Learning for Mobile Manipulation

    Get PDF
    To make robots valuable in our everyday lives, they need to be able to make good decisions even in unexpected situations. Reinforcement learning is a paradigm that aims to learn decision-making models for robots without the need for direct examples of the correct decisions. For this type of robot learning, it is common practice to learn a single central model that controls the entire robot. This work is motivated by advances in modular and swarm robotics, where multiple robots or decision-makers collaborate to complete a task. Instead of learning a single central model, we explore the idea of learning multiple decision-making models, each controlling a different part of the robot. In particular, we investigate whether providing the different models with different sensing capabilities helps the robot to learn or to be robust to perturbations. We formulate these problems as multi-agent problems and use a multi-agent reinforcement learning algorithm to solve them. To evaluate our approach, we design a mobile manipulation task and implement a simulation-based training pipeline to produce decision-making models that can complete the task. The trained models are then directly transferred to a real autonomous mobile manipulator system. Several experiments are performed on the real system to compare the performance and robustness against the usual central model baseline. Our experimental results show that our approach can learn faster and produce decision-making models that are more robust to perturbations

    Designing Decentralized controllers for distributed-air-jet MEMS-based micromanipulators by reinforcement learning.

    No full text
    International audienceDistributed-air-jet MEMS-based systems have been proposed to manipulate small parts with high velocities and without any friction problems. The control of such distributed systems is very challenging and usual approaches for contact arrayed system don't produce satisfactory results. In this paper, we investigate reinforcement learning control approaches in order to position and convey an object. Reinforcement learning is a popular approach to find controllers that are tailored exactly to the system without any prior model. We show how to apply reinforcement learning in a decentralized perspective and in order to address the global-local trade-off. The simulation results demonstrate that the reinforcement learning method is a promising way to design control laws for such distributed systems

    Hysteretic Q-Learning : an algorithm for decentralized reinforcement learning in cooperative multi-agent teams.

    No full text
    International audienceMulti-agent systems (MAS) are a field of study of growing interest in a variety of domains such as robotics or distributed controls. The article focuses on decentralized reinforcement learning (RL) in cooperative MAS, where a team of independent learning robot (IL) try to coordinate their individual behavior to reach a coherent joint behavior. We assume that each robot has no information about its teammates'actions. To date, RL approaches for such ILs did not guarantee convergence to the optimal joint policy in scenarios where the coordination is difficult. We report an investigation of existing algorithms for the learning of coordination in cooperative MAS, and suggest a Q-Learning extension for ILs, called Hysteretic Q-Learning. This algorithm does not require any additional communication between robots. Its advantages are showing off and compared to other methods on various applications : bimatrix games, collaborative ball balancing task and pursuit domain

    Enhancing Exploration and Safety in Deep Reinforcement Learning

    Get PDF
    A Deep Reinforcement Learning (DRL) agent tries to learn a policy maximizing a long-term objective by trials and errors in large state spaces. However, this learning paradigm requires a non-trivial amount of interactions in the environment to achieve good performance. Moreover, critical applications, such as robotics, typically involve safety criteria to consider while designing novel DRL solutions. Hence, devising safe learning approaches with efficient exploration is crucial to avoid getting stuck in local optima, failing to learn properly, or causing damages to the surrounding environment. This thesis focuses on developing Deep Reinforcement Learning algorithms to foster efficient exploration and safer behaviors in simulation and real domains of interest, ranging from robotics to multi-agent systems. To this end, we rely both on standard benchmarks, such as SafetyGym, and robotic tasks widely adopted in the literature (e.g., manipulation, navigation). This variety of problems is crucial to assess the statistical significance of our empirical studies and the generalization skills of our approaches. We initially benchmark the sample efficiency versus performance trade-off between value-based and policy-gradient algorithms. This part highlights the benefits of using non-standard simulation environments (i.e., Unity), which also facilitates the development of further optimization for DRL. We also discuss the limitations of standard evaluation metrics (e.g., return) in characterizing the actual behaviors of a policy, proposing the use of Formal Verification (FV) as a practical methodology to evaluate behaviors over desired specifications. The second part introduces Evolutionary Algorithms (EAs) as a gradient-free complimentary optimization strategy. In detail, we combine population-based and gradient-based DRL to diversify exploration and improve performance both in single and multi-agent applications. For the latter, we discuss how prior Multi-Agent (Deep) Reinforcement Learning (MARL) approaches hinder exploration, proposing an architecture that favors cooperation without affecting exploration
    corecore