Search CORE

67 research outputs found

Hysteretic Q-Learning : an algorithm for decentralized reinforcement learning in cooperative multi-agent teams.

Author: Laurent Guillaume,
Le Fort-Piat Nadine
Matignon Laëtitia
Publication venue: HAL CCSD
Publication date: 29/10/2007
Field of study

International audienceMulti-agent systems (MAS) are a field of study of growing interest in a variety of domains such as robotics or distributed controls. The article focuses on decentralized reinforcement learning (RL) in cooperative MAS, where a team of independent learning robot (IL) try to coordinate their individual behavior to reach a coherent joint behavior. We assume that each robot has no information about its teammates'actions. To date, RL approaches for such ILs did not guarantee convergence to the optimal joint policy in scenarios where the coordination is difficult. We report an investigation of existing algorithms for the learning of coordination in cooperative MAS, and suggest a Q-Learning extension for ILs, called Hysteretic Q-Learning. This algorithm does not require any additional communication between robots. Its advantages are showing off and compared to other methods on various applications : bimatrix games, collaborative ball balancing task and pursuit domain

HAL - Université de Franche-Comté

Crossref

Un algorithme décentralisé d'apprentissage par renforcement multi-agents coopératifs : le Q-Learning Hystérétique.

Author: Laurent Guillaume,
Le Fort - Piat Nadine
Matignon Laëtitia
Publication venue: Cépaduès Editions
Publication date: 01/07/2007
Field of study

National audienceNous nous intéressons aux techniques d'apprentissage par renforcement dans les systèmes multi-agents coopératifs. Nous présentons un nouvel algorithme pour agents indépendants qui permet d'apprendre l'action jointe optimale dans des jeux où la coordination est difficile. Nous motivons notre approche par le caractère décentralisé de cet algorithme qui ne nécessite aucune communication entre agents et des tables Q de taille indépendante du nombre d'agents. Des tests concluants sont de plus effectués sur des jeux coopératifs répétés, ainsi que sur un jeu de poursuite

HAL - Université de Franche-Comté

Reward function and initial values : Better choices for accelerated Goal-directed reinforcement learning.

Author: Laurent Guillaume,
Le Fort - Piat Nadine
Matignon Laëtitia
Publication venue: 'Springer Fachmedien Wiesbaden GmbH'
Publication date: 01/01/2006
Field of study

International audienceAn important issue in Reinforcement Learning (RL) is to accelerate or improve the learning process. In this paper, we study the influence of some RL parameters over the learning speed. Indeed, although RL convergence properties have been widely studied, no precise rules exist to correctly choose the reward function and initial Q-values. Our method helps the choice of these RL parameters within the context of reaching a goal in a minimal time. We develop a theoretical study and also provide experimental justifications for choosing on the one hand the reward function, and on the other hand particular initial Q-values based on a goal bias function

HAL - Université de Franche-Comté

A study of FMQ heuristic in cooperative multi-agent games.

Author: Laurent Guillaume,
Le Fort - Piat Nadine
Matignon Laëtitia
Publication venue: HAL CCSD
Publication date: 12/05/2008
Field of study

International audienceThe article focuses on decentralized reinforcement learning (RL) in cooperative multi-agent games, where a team of independent learning agents (ILs) try to coordinate their individual actions to reach an optimal joint action. Within this framework, some algorithms based on Q-learning are proposed in recent works. Especially, we are interested in Distributed Q-learning which finds optimal policies in deterministic games, and in the Frequency Maximum Q value (FMQ) heuristic which is able in partially stochastic matrix games to distinguish if a poor reward received for the same action are due to either miscoordination or to the noisy reward function. Making this distinction is one of the main difficulties to solve stochastic games. Our objective is to find an algorithm able to switch over the updates according to a detection of the cause of noise. In this paper, a modified version of the FMQ heuristic is proposed which achieves this detection and the update adaptation. Moreover, this modified FMQ version is more robust and very easy to set

HAL - Université de Franche-Comté

Multi-Robot Simultaneous Coverage and Mapping of Complex Scene

Author: Matignon Laëtitia
Simonin Olivier
Publication venue: HAL CCSD
Publication date: 10/07/2018
Field of study

International audienceIn this demonstration, participants will explore a system for multi-robot observation of a complex scene involving the activity of a person. Mobile robots have to cooperate to find a position around the scene maximizing its coverage, i.e. allowing a complete view of the human skeleton. Simultaneously, they have to map the unknown environment around the scene. We developed a simulator presented in this paper that allows to generate an environment, a scene, and to simulate robots' observations and motion. During the demonstration, users will be able to test our simulator, including setting up a scenario and a decision algorithm, monitoring the movements, observations and maps of the robots, and visualizing the performance of the team

INRIA a CCSD electronic archive server

HAL

Hal-Diderot

Choix de la fonction de renforcement et des valeurs initiales pour accélérer les problèmes d'Apprentissage par Renforcement de plus court chemin stochastique.

Author: Laurent Guillaume,
Le Fort-Piat Nadine
Matignon Laëtitia
Publication venue: HAL CCSD
Publication date: 10/05/2006
Field of study

National audienceUn point important en apprentissage par renforcement (AR) est l'amélioration de la vitesse de convergence du processus d'apprentissage. Nous proposons dans cet article d'étudier l'influence de certains paramètres de l'AR sur la vitesse d'apprentissage. En effet, bien que les propriétés de convergence de l'AR ont été largement étudiées, peu de règles précises existent pour choisir correctement la fonction de renforcement et les valeurs initiales de la table Q. Notre méthode aide au choix de ces paramètres dans le cadre de problèmes de type goal-directed, c'est-à-dire dont l'objectif est d'atteindre un but en un minimum de temps. Nous développons une étude théorique et proposons ensuite des justifications expérimentales pour choisir d'une part la fonction de renforcement et d'autre part des valeurs initiales particulières de la table Q, basées sur une fonction d'influence

HAL - Université de Franche-Comté

The world of Independent learners is not Markovian.

Author: Laurent Guillaume J.
Le Fort-Piat Nadine
Matignon Laëtitia
Publication venue: 'IOS Press'
Publication date: 23/03/2011
Field of study

International audienceIn multi-agent systems, the presence of learning agents can cause the environment to be non-Markovian from an agent's perspective thus violat- ing the property that traditional single-agent learning methods rely upon. This paper formalizes some known intuition about concurrently learning agents by providing formal conditions that make the environment non- Markovian from an independent (non-communicative) learner's perspec- tive. New concepts are introduced like the divergent learning paths and the observability of the e ects of others' actions. To illustrate the formal concepts, a case study is also presented. These ndings are signi cant because they both help to understand failures and successes of existing learning algorithms as well as being suggestive for future work

HAL - Université de Franche-Comté

Crossref

Independent reinforcement learners in cooperative Markov games: a survey regarding coordination problems.

Author: Laurent Guillaume J.
Le Fort-Piat Nadine
Matignon Laëtitia
Publication venue: 'Cambridge University Press (CUP)'
Publication date: 06/03/2012
Field of study

International audienceIn the framework of fully cooperative multi-agent systems, independent (non-communicative) agents that learn by reinforcement must overcome several difficulties to manage to coordinate. This paper identifies several challenges responsible for the non-coordination of independent agents: Pareto-selection, nonstationarity, stochasticity, alter-exploration and shadowed equilibria. A selection of multi-agent domains is classified according to those challenges: matrix games, Boutilier's coordination game, predators pursuit domains and a special multi-state game. Moreover the performance of a range of algorithms for independent reinforcement learners is evaluated empirically. Those algorithms are Q-learning variants: decentralized Q-learning, distributed Q-learning, hysteretic Q-learning, recursive FMQ and WoLF PHC. An overview of the learning algorithms' strengths and weaknesses against each challenge concludes the paper and can serve as a basis for choosing the appropriate algorithm for a new domain. Furthermore, the distilled challenges may assist in the design of new learning algorithms that overcome these problems and achieve higher performance in multi-agent applications

HAL - Université de Franche-Comté

SOaN : un algorithme pour la coordination d'agents apprenants et non communicants.

Author: Laurent Guillaume J.
Le Fort-Piat Nadine
Matignon Laëtitia
Publication venue: HAL CCSD
Publication date: 02/06/2009
Field of study

National audienceL'apprentissage par renforcement dans les systèmes multi-agents est un domaine de recherche très actif, comme en témoignent les états de l'art récents [Busoniu et al., 2008, Sandholm, 2007, Bab & Brafman, 2008, Vlassis, 2007]. Lauer et Riedmiller ont notamment montré que, sous certaines hypothèses, il est possible à des agents apprenants simultanément de coordonner leurs actions sans aucune communication et sans qu'ils perçoivent les actions de leurs congénères [Lauer & Riedmiller, 2000]. Cette propriété est particulièrement intéressante pour trouver des stratégies de coopération dans les systèmes multi-agents de grande taille

HAL - Université de Franche-Comté

Coordination of independent learners in cooperative Markov games.

Author: Laurent Guillaume J.
Le Fort-Piat Nadine
Matignon Laëtitia
Publication venue: HAL CCSD
Publication date: 01/01/2009
Field of study

In the framework of fully cooperative multi-agent systems, independent agents learning by reinforcement must overcome several difficulties as the coordination or the impact of exploration. The study of these issues allows first to synthesize the characteristics of existing reinforcement learning decentralized methods for independent learners in cooperative Markov games. Then, given the difficulties encountered by these approaches, we focus on two main skills: optimistic agents, which manage the coordination in deterministic environments, and the detection of the stochasticity of a game. Indeed, the key difficulty in stochastic environment is to distinguish between various causes of noise. The SOoN algorithm is so introduced, standing for “Swing between Optimistic or Neutral”, in which independent learners can adapt automatically to the environment stochasticity. Empirical results on various cooperative Markov games notably show that SOoN overcomes the main factors of non-coordination and is robust face to the exploration of other agents

HAL - Université de Franche-Comté