Search CORE

300 research outputs found

Selfishness Level Induces Cooperation in Sequential Social Dilemmas

Author: Du Yali
Leonardos Stefanos
Roesch Stefan
Publication venue
Publication date: 01/01/2024
Field of study

A key contributor to the success of modern societies is humanity’s innate ability to meaningfully cooperate. Modern game-theoretic reasoning shows however, that an individual’s amenity to cooperation is directly linked with the mechanics of the scenario at hand. Social dilemmas constitute a subset of particularly thorny such scenarios, typically modelled as normal-form or sequential games, where players are caught in a dichotomy between the decision to cooperate with teammates or to defect, and further their own goals. In this work, we study such social dilemmas through the lens of ’selfishness level’, a standard game-theoretic metric which quantifies the extent to which a game’s payoffs incentivize defective behaviours.The selfishness level is significant in this context as it doubles as a prescriptive notion, describing the exact payoff modifications necessary to induce players with prosocial preferences. Using this framework, we are able to derive conditions, and means, under which normal-form social dilemmas can be resolved. We also produce a first-step towards extending this metric to Markov-game or sequential social dilemmas with the aim of quantitatively measuring the magnitude to which such environments incentivize selfish behaviours. Finally, we present an exploratory empirical analysis showing the positive effects of using a selfishness level directed reward shaping scheme in such environments

King's Research Portal

Resolving social dilemmas with minimal reward transfer

Author: Du Yali
Leibo Joel Z
Luck Michael
Willis Richard
Publication venue
Publication date: 19/10/2023
Field of study

Multi-agent cooperation is an important topic, and is particularly challenging in mixed-motive situations where it does not pay to be nice to others. Consequently, self-interested agents often avoid collective behaviour, resulting in suboptimal outcomes for the group. In response, in this paper we introduce a metric to quantify the disparity between what is rational for individual agents and what is rational for the group, which we call the general self-interest level. This metric represents the maximum proportion of individual rewards that all agents can retain while ensuring that achieving social welfare optimum becomes a dominant strategy. By aligning the individual and group incentives, rational agents acting to maximise their own reward will simultaneously maximise the collective reward. As agents transfer their rewards to motivate others to consider their welfare, we diverge from traditional concepts of altruism or prosocial behaviours. The general self-interest level is a property of a game that is useful for assessing the propensity of players to cooperate and understanding how features of a game impact this. We illustrate the effectiveness of our method on several novel games representations of social dilemmas with arbitrary numbers of players.Comment: 34 pages, 13 tables, submitted to the Journal of Autonomous Agents and Multi-Agent Systems: Special Issue on Citizen-Centric AI System

arXiv.org e-Print Archive

An End-to-End Task Allocation Framework for Autonomous Mobile Systems

Author: Bucknall Richard
Du Yali
Liu Yuanchang
Ma Song
Ruan Jingqing
Publication venue: EPSRC UK-RAS Network
Publication date: 01/01/2022
Field of study

This work aims to unravel the problem of task allocation and planning for multi-agent systems with a particular interest in promoting adaptability. We proposed a novel end-to-end task allocation framework employing reinforcement learning methods to replace the handcrafted heuristics used in previous works. The proposed framework achieves high adaptability and also explores more competitive results. Learning experiences from the feedback help to reach the advantages. The systematic objectives are adjustable and responsive to the reward design intuitively. The framework is validated in a set of tests with various parameter settings, where adaptability and performance are demonstrated

UCL Discovery

Estimating α-Rank from A Few Entries with Low Rank Matrix Completion

Author: Chen Xu
Du Yali
Wang Jun
Yan Xue
Zhang Haifeng
Publication venue: Proceedings of Machine Learning Research (PMLR)
Publication date: 01/01/2021
Field of study

Multi-agent evaluation aims at the assessment of an agent's strategy on the basis of interaction with others. Typically, existing methods such as α-rank and its approximation still require to exhaustively compare all pairs of joint strategies for an accurate ranking, which in practice is computationally expensive. In this paper, we aim to reduce the number of pairwise comparisons in recovering a satisfying ranking for n strategies in two-player meta-games, by exploring the fact that agents with similar skills may achieve similar payoffs against others. Two situations are considered: the first one is when we can obtain the true payoffs; the other one is when we can only access noisy payoff. Based on these formulations, we leverage low-rank matrix completion and design two novel algorithms for noise-free and noisy evaluations respectively. For both of these settings, we theorize that O(nr log n) (n is the number of agents and r is the rank of the payoff matrix) payoff entries are required to achieve sufficiently well strategy evaluation performance. Empirical results on evaluating the strategies in three synthetic games and twelve real world games demonstrate that strategy evaluation from a few entries can lead to comparable performance to algorithms with full knowledge of the payoff matrix

UCL Discovery

Zero-shot Preference Learning for Offline RL via Optimal Transport

Author: Bai Fengshuo
Du Yali
Li Xiu
Liu Runze
Lyu Jiafei
Publication venue
Publication date: 06/06/2023
Field of study

Preference-based Reinforcement Learning (PbRL) has demonstrated remarkable efficacy in aligning rewards with human intentions. However, a significant challenge lies in the need of substantial human labels, which is costly and time-consuming. Additionally, the expensive preference data obtained from prior tasks is not typically reusable for subsequent task learning, leading to extensive labeling for each new task. In this paper, we propose a novel zero-shot preference-based RL algorithm that leverages labeled preference data from source tasks to infer labels for target tasks, eliminating the requirement for human queries. Our approach utilizes Gromov-Wasserstein distance to align trajectory distributions between source and target tasks. The solved optimal transport matrix serves as a correspondence between trajectories of two tasks, making it possible to identify corresponding trajectory pairs between tasks and transfer the preference labels. However, learning directly from inferred labels that contains a fraction of noisy labels will result in an inaccurate reward function, subsequently affecting policy performance. To this end, we introduce Robust Preference Transformer, which models the rewards as Gaussian distributions and incorporates reward uncertainty in addition to reward mean. The empirical results on robotic manipulation tasks of Meta-World and Robomimic show that our method has strong capabilities of transferring preferences between tasks and learns reward functions from noisy labels robustly. Furthermore, we reveal that our method attains near-oracle performance with a small proportion of scripted labels

arXiv.org e-Print Archive

Learning to Identify Top Elo Ratings: A Dueling Bandits Approach

Author: Chen Xu
Du Yali
Ru Binxin
Wang Jun
Yan Xue
Zhang Haifeng
Publication venue: 'Association for the Advancement of Artificial Intelligence (AAAI)'
Publication date: 19/01/2022
Field of study

The Elo rating system is widely adopted to evaluate the skills of (chess) game and sports players. Recently it has been also integrated into machine learning algorithms in evaluating the performance of computerised AI agents. However, an accurate estimation of the Elo rating (for the top players) often requires many rounds of competitions, which can be expensive to carry out. In this paper, to improve the sample efficiency of the Elo evaluation (for top players), we propose an efficient online match scheduling algorithm. Specifically, we identify and match the top players through a dueling bandits framework and tailor the bandit algorithm to the gradient-based update of Elo. We show that it reduces the per-step memory and time complexity to constant, compared to the traditional likelihood maximization approaches requiring O(t) time. Our algorithm has a regret guarantee of Õ(√T), sublinear in the number of competition rounds and has been extended to the multidimensional Elo ratings for handling intransitive games. We empirically demonstrate that our method achieves superior convergence speed and time efficiency on a variety of gaming tasks

arXiv.org e-Print Archive

UCL Discovery

Association for the Advancement of Artificial Intelligence: AAAI Publications