Search CORE

2,025 research outputs found

An algorithm for cooperative probabilistic control design

Author: Barão Miguel
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/07/2012
Field of study

This paper deals with the decentralized closed loop control in a pure probabilistic framework. In this framework, a system is a controlled Markov chain whose transition probabilities depend on the actions of the agents. The agents are also described in a probabilistic way. The objective is to drive the system so that the joint state and agents actions are close to a set of given target probability distributions. The Kullback-Leibler divergence is used as a performance measure. The resulting algorithm uses dynamic programming interleaved with an iterative process that computes the behavior of each agent

Crossref

Repositório Científico da Universidade de Évora

Near-Optimal Adversarial Policy Switching for Decentralized Asynchronous Multi-Agent Systems

Author: Amato Christopher
Hoang Trong Nghia
How Jonathan
Sivakumar Kavinayan
Xiao Yuchen
Publication venue
Publication date: 17/10/2017
Field of study

A key challenge in multi-robot and multi-agent systems is generating solutions that are robust to other self-interested or even adversarial parties who actively try to prevent the agents from achieving their goals. The practicality of existing works addressing this challenge is limited to only small-scale synchronous decision-making scenarios or a single agent planning its best response against a single adversary with fixed, procedurally characterized strategies. In contrast this paper considers a more realistic class of problems where a team of asynchronous agents with limited observation and communication capabilities need to compete against multiple strategic adversaries with changing strategies. This problem necessitates agents that can coordinate to detect changes in adversary strategies and plan the best response accordingly. Our approach first optimizes a set of stratagems that represent these best responses. These optimized stratagems are then integrated into a unified policy that can detect and respond when the adversaries change their strategies. The near-optimality of the proposed framework is established theoretically as well as demonstrated empirically in simulation and hardware

arXiv.org e-Print Archive

Crossref

DSpace@MIT

Agent abstraction in multi-agent reinforcement learning

Author: Memarian Amin
Publication venue
Publication date: 01/06/2022
Field of study

Cette thèse est organisée en deux chapitres. Le premier chapitre sert d’introduction aux concepts et idées utilisés dans le deuxième chapitre (l’article). Le premier chapitre est divisé en trois sections. Dans la première section, nous introduisons l’apprentissage par renforcement en tant que paradigme d’apprentissage automatique et montrons comment ses problèmes sont formalisés à l’aide de processus décisionnels de Markov. Nous formalisons les buts sous forme de rendements attendus et montrons comment les équations de Bellman utilisent la formulation récursive du rendement pour établir une relation entre les valeurs de deux états successifs sous la politique de l’agent. Après cela, nous soutenons que la résolution des équations d’optimalité de Bellman est insoluble et introduisons des algorithmes basés sur des valeurs tels que la programmation dynamique, les méthodes de Monte Carlo et les méthodes de différence temporelle qui se rapprochent de la solution optimale à l’aide de l’itération de politique généralisée. L’approximation de fonctions est ensuite proposée comme moyen de traiter les grands espaces d’états. Nous discutons également de la manière dont les méthodes basées sur les politiques optimisent directement la politique sans optimiser la fonction de valeur. Dans la deuxième section, nous introduisons les jeux de Markov comme une extension des processus décisionnels de Markov pour plusieurs agents. Nous couvrons les différents cadres formés par les différentes structures de récompense et donnons les dilemmes sociaux séquentiels comme exemple du cadre d’incitation mixte. En fin de compte, nous introduisons différentes structures d’information telles que l’apprentissage centralisé qui peuvent aider à faire face à la non-stationnarité in- duite par l’adversaire. Enfin, dans la troisième section, nous donnons un bref aperçu des types d’abstraction d’état et introduisons les métriques de bisimulation comme un concept inspiré de l’abstraction de non-pertinence du modèle qui mesure la similarité entre les états. Dans le deuxième chapitre (l’article), nous approfondissons finalement l’abstraction d’agent en tant que métrique de bisimulation et dérivons un facteur de compression que nous pouvons appliquer à la diplomatie pour révéler l’agence supérieure sur les unités de joueur.This thesis is organized into two chapters. The first chapter serves as an introduction to the concepts and ideas used in the second chapter (the article). The first chapter is divided into three sections. In the first section, we introduce Reinforcement Learning as a Machine Learning paradigm and show how its problems are formalized using Markov Decision Processes. We formalize goals as expected returns and show how the Bellman equations use the recursive formulation of return to establish a relation between the values of two successive states under the agent’s policy. After that, we argue that solving the Bellman optimality equations is intractable and introduce value-based algorithms such as Dynamic Programming, Monte Carlo methods, and Temporal Difference methods that approximate the optimal solution using Generalized Policy Iteration. Function approximation is then proposed as a way of dealing with large state spaces. We also discuss how policy-based methods optimize the policy directly without optimizing the value function. In the second section, we introduce Markov Games as an extension of Markov Decision Processes for multiple agents. We cover the different settings formed by the different reward structures and give Sequential Social Dilemmas as an example of the mixed-incentive setting. In the end, we introduce different information structures such as centralized learning that can help deal with the opponent-induced non-stationarity. Finally, in the third section, we give a brief overview of state abstraction types and introduce bisimulation metrics as a concept inspired by model-irrelevance abstraction that measures the similarity between states. In the second chapter (the article), we ultimately delve into agent abstraction as a bisimulation metric and derive a compression factor that we can apply to Diplomacy to reveal the higher agency over the player units

Dépôt Institutionnel Numérique

Fairness in Multi-Agent Sequential Decision-Making

Author: Shah Julie A
Zhang Chongjie
Publication venue: Neural Information Processing Systems Foundation Inc.
Publication date: 01/12/2014
Field of study

We define a fairness solution criterion for multi-agent decision-making problems, where agents have local interests. This new criterion aims to maximize the worst performance of agents with consideration on the overall performance. We develop a simple linear programming approach and a more scalable game-theoretic approach for computing an optimal fairness policy. This game-theoretic approach formulates this fairness optimization as a two-player, zero-sum game and employs an iterative algorithm for finding a Nash equilibrium, corresponding to an optimal fairness policy. We scale up this approach by exploiting problem structure and value function approximation. Our experiments on resource allocation problems show that this fairness criterion provides a more favorable solution than the utilitarian criterion, and that our game-theoretic approach is significantly faster than linear programming

DSpace@MIT