Search CORE

44 research outputs found

Advances in Multi-Agent Reinforcement Learning : experience sharing, parameter sharing, equilibrium selection

Author: Christianos Filippos
Publication venue: University of Edinburgh
Publication date: 01/08/2023
Field of study

Multi-Agent Reinforcement Learning (MARL) has recently gained significant attention due to its potential to train decision-making policies in complex environments involving multiple agents. This thesis presents four contributions to the field of MARL, addressing challenges such as sample efficiency, scaling to large numbers of agents, and improving solution quality. The first contribution is a benchmark of nine state-of-the-art MARL algorithms across 25 tasks, providing a comprehensive overview of the current capabilities of MARL methods. The results of the benchmark study not only provide a thorough evaluation of existing methods, but also identify several areas for potential improvement. The second contribution is the Shared Experience Actor-critic (SEAC) algorithm, which improves sample efficiency by allowing agents to share their experiences in an actor critic framework. SEAC addresses the limitation of existing algorithms in learning from sparse rewards environments and is shown to consistently outperform two baselines and two state-of-the-art methods in those settings. The third contribution is the Selective Parameter Sharing (SePS) algorithm, which groups agents that would benefit from sharing parameters, leading to improved sample efficiency and faster convergence. Experiments show that SePS combines the benefits of other parameter sharing baselines, and can scale to hundreds of agents, even if the agents are not homogeneous. The fourth contribution is the Pareto Actor-critic (Pareto-AC) algorithm, an algorithm that aims to converge to Pareto optimal equilibria. Many state-of-the-art MARL algorithms, as identified by the benchmarking study, tend to converge to suboptimal equilibria. Instead, PAC is shown to converge to the Pareto equilibria in a range of tasks, even if multiple suboptimal equilibria exist. Through these contributions, this thesis makes significant progress towards addressing key challenges of MARL

ROS: The Research Output Service. Heriot-Watt University Edinburgh

Intrinsic Language-Guided Exploration for Complex Long-Horizon Robotic Manipulation Tasks

Author: Christianos F
Li Z
Triantafyllidis E
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 08/08/2024
Field of study

Current reinforcement learning algorithms struggle in sparse and complex environments, most notably in long-horizon manipulation tasks entailing a plethora of different sequences. In this work, we propose the Intrinsically Guided Exploration from Large Language Models (IGE-LLMs) framework. By leveraging LLMs as an assistive intrinsic reward, IGE-LLMs guides the exploratory process in reinforcement learning to address intricate long-horizon with sparse rewards robotic manipulation tasks. We evaluate our framework and related intrinsic learning methods in an environment challenged with exploration, and a complex robotic manipulation task challenged by both exploration and long-horizons. Results show IGE-LLMs (i) exhibit notably higher performance over related intrinsic methods and the direct use of LLMs in decision-making, (ii) can be combined and complement existing learning methods highlighting its modularity, (iii) are fairly insensitive to different intrinsic scaling parameters, and (iv) maintain robustness against increased levels of uncertainty and horizons

UCL Discovery

Intrinsic Language-Guided Exploration for Complex Long-Horizon Robotic Manipulation Tasks

Author: Christianos Filippos
Li Zhibin
Triantafyllidis Eleftherios
Publication venue
Publication date: 28/09/2023
Field of study

arXiv.org e-Print Archive

Pareto Actor-Critic for Equilibrium Selection in Multi-Agent Reinforcement Learning

Author: Albrecht Stefano V.
Christianos Filippos
Papoudakis Georgios
Publication venue
Publication date: 22/07/2023
Field of study

This work focuses on equilibrium selection in no-conflict multi-agent games, where we specifically study the problem of selecting a Pareto-optimal equilibrium among several existing equilibria. It has been shown that many state-of-the-art multi-agent reinforcement learning (MARL) algorithms are prone to converging to Pareto-dominated equilibria due to the uncertainty each agent has about the policy of the other agents during training. To address sub-optimal equilibrium selection, we propose Pareto Actor-Critic (Pareto-AC), which is an actor-critic algorithm that utilises a simple property of no-conflict games (a superset of cooperative games): the Pareto-optimal equilibrium in a no-conflict game maximises the returns of all agents and therefore is the preferred outcome for all agents. We evaluate Pareto-AC in a diverse set of multi-agent games and show that it converges to higher episodic returns compared to seven state-of-the-art MARL algorithms and that it successfully converges to a Pareto-optimal equilibrium in a range of matrix games. Finally, we propose PACDCG, a graph neural network extension of Pareto-AC which is shown to efficiently scale in games with a large number of agents.Comment: 20 pages, 12 figure

arXiv.org e-Print Archive

Shared Experience Actor-Critic for Multi-Agent Reinforcement Learning

Author: Albrecht Stefano V
Christianos Filippos
Schäfer Lukas
Publication venue
Publication date: 06/12/2020
Field of study

Exploration in multi-agent reinforcement learning is a challenging problem, especially in environments with sparse rewards. We propose a general method for efficient exploration by sharing experience amongst agents. Our proposed algorithm, called Shared Experience Actor-Critic (SEAC), applies experience sharing in an actor-critic framework. We evaluate SEAC in a collection of sparse-reward multi-agent environments and find that it consistently outperforms two baselines and two state-of-the-art algorithms by learning in fewer steps and converging to higher returns. In some harder environments, experience sharing makes the difference between learning to solve the task and not learning at all.Comment: 34th Conference on Neural Information Processing Systems (NeurIPS 2020), Vancouver, Canad

arXiv.org e-Print Archive

Edinburgh Research Explorer

Learning Task Embeddings for Teamwork Adaptation in Multi-Agent Reinforcement Learning

Author: Albrecht Stefano V.
Christianos Filippos
Schäfer Lukas
Storkey Amos
Publication venue
Publication date: 20/11/2023
Field of study

Successful deployment of multi-agent reinforcement learning often requires agents to adapt their behaviour. In this work, we discuss the problem of teamwork adaptation in which a team of agents needs to adapt their policies to solve novel tasks with limited fine-tuning. Motivated by the intuition that agents need to be able to identify and distinguish tasks in order to adapt their behaviour to the current task, we propose to learn multi-agent task embeddings (MATE). These task embeddings are trained using an encoder-decoder architecture optimised for reconstruction of the transition and reward functions which uniquely identify tasks. We show that a team of agents is able to adapt to novel tasks when provided with task embeddings. We propose three MATE training paradigms: independent MATE, centralised MATE, and mixed MATE which vary in the information used for the task encoding. We show that the embeddings learned by MATE identify tasks and provide useful information which agents leverage during adaptation to novel tasks.Comment: To be presented at the Seventh Workshop on Generalization in Planning at the NeurIPS 2023 conferenc

arXiv.org e-Print Archive