Search CORE

86,484 research outputs found

Trust Region Policy Optimisation in Multi-Agent Reinforcement Learning

Author: Chen R
Kuba JG
Sun F
Wang J
Wen M
Wen Y
Yang Y
Publication venue: The International Conference on Learning Representations (ICLR)
Publication date: 04/04/2022
Field of study

Trust region methods rigorously enabled reinforcement learning (RL) agents to learn monotonically improving policies, leading to superior performance on a variety of tasks. Unfortunately, when it comes to multi-agent reinforcement learning (MARL), the property of monotonic improvement may not simply apply; this is because agents, even in cooperative games, could have conflicting directions of policy updates. As a result, achieving a guaranteed improvement on the joint policy where each agent acts individually remains an open challenge. In this paper, we extend the theory of trust region learning to cooperative MARL. Central to our findings are the multi-agent advantage decomposition lemma and the sequential policy update scheme. Based on these, we develop Heterogeneous-Agent Trust Region Policy Optimisation (HATPRO) and Heterogeneous-Agent Proximal Policy Optimisation (HAPPO) algorithms. Unlike many existing MARL algorithms, HATRPO/HAPPO do not need agents to share parameters, nor do they need any restrictive assumptions on decomposibility of the joint value function. Most importantly, we justify in theory the monotonic improvement property of HATRPO/HAPPO. We evaluate the proposed methods on a series of Multi-Agent MuJoCo and StarCraftII tasks. Results show that HATRPO and HAPPO significantly outperform strong baselines such as IPPO, MAPPO and MADDPG on all tested tasks, thereby establishing a new state of the art

arXiv.org e-Print Archive

Evaluator services for optimised service placement in distributed heterogeneous cloud infrastructures

Author: Bursztynowski D
Franke M
Griffin D
Khoa Phan T
Rio M
Schamel F
Simoens Pieter
Smet Piet
Vandeputte F
Vermoesen L
Publication venue
Publication date: 01/01/2015
Field of study

Optimal placement of demanding real-time interactive applications in a distributed heterogeneous cloud very quickly results in a complex tradeoff between the application constraints and resource capabilities. This requires very detailed information of the various requirements and capabilities of the applications and available resources. In this paper, we present a mathematical model for the service optimization problem and study the concept of evaluator services as a flexible and efficient solution for this complex problem. An evaluator service is a service probe that is deployed in particular runtime environments to assess the feasibility and cost-effectiveness of deploying a specific application in such environment. We discuss how this concept can be incorporated in a general framework such as the FUSION architecture and discuss the key benefits and tradeoffs for doing evaluator-based optimal service placement in widely distributed heterogeneous cloud environments

An Exchange Mechanism to Coordinate Flexibility in Residential Energy Cooperatives

Author: Chakraborty Shantanu
Hernandez-Leal Pablo
Kaisers Michael
Publication venue
Publication date: 13/02/2019
Field of study

Energy cooperatives (ECs) such as residential and industrial microgrids have the potential to mitigate increasing fluctuations in renewable electricity generation, but only if their joint response is coordinated. However, the coordination and control of independently operated flexible resources (e.g., storage, demand response) imposes critical challenges arising from the heterogeneity of the resources, conflict of interests, and impact on the grid. Correspondingly, overcoming these challenges with a general and fair yet efficient exchange mechanism that coordinates these distributed resources will accommodate renewable fluctuations on a local level, thereby supporting the energy transition. In this paper, we introduce such an exchange mechanism. It incorporates a payment structure that encourages prosumers to participate in the exchange by increasing their utility above baseline alternatives. The allocation from the proposed mechanism increases the system efficiency (utilitarian social welfare) and distributes profits more fairly (measured by Nash social welfare) than individual flexibility activation. A case study analyzing the mechanism performance and resulting payments in numerical experiments over real demand and generation profiles of the Pecan Street dataset elucidates the efficacy to promote cooperation between co-located flexibilities in residential cooperatives through local exchange.Comment: Accepted in IEEE ICIT 201

arXiv.org e-Print Archive