33 research outputs found

    Mixed Integer Linear Programming For Exact Finite-Horizon Planning In Decentralized Pomdps

    Get PDF
    We consider the problem of finding an n-agent joint-policy for the optimal finite-horizon control of a decentralized Pomdp (Dec-Pomdp). This is a problem of very high complexity (NEXP-hard in n >= 2). In this paper, we propose a new mathematical programming approach for the problem. Our approach is based on two ideas: First, we represent each agent's policy in the sequence-form and not in the tree-form, thereby obtaining a very compact representation of the set of joint-policies. Second, using this compact representation, we solve this problem as an instance of combinatorial optimization for which we formulate a mixed integer linear program (MILP). The optimal solution of the MILP directly yields an optimal joint-policy for the Dec-Pomdp. Computational experience shows that formulating and solving the MILP requires significantly less time to solve benchmark Dec-Pomdp problems than existing algorithms. For example, the multi-agent tiger problem for horizon 4 is solved in 72 secs with the MILP whereas existing algorithms require several hours to solve it

    Computing the Equilibria of Bimatrix Games using Dominance Heuristics

    Get PDF
    We propose a formulation of a general-sum bimatrix game as a bipartite directed graph with the objective of establishing a correspondence between the set of the relevant structures of the graph (in particular elementary cycles) and the set of the Nash equilibria of the game. We show that finding the set of elementary cycles of the graph permits the computation of the set of equilibria. For games whose graphs have a sparse adjacency matrix, this serves as a good heuristic for computing the set of equilibria. The heuristic also allows the discarding of sections of the support space that do not yield any equilibrium, thus serving as a useful pre-processing step for algorithms that compute the equilibria through support enumeration

    Batch Reinforcement Learning for Optimizing Longitudinal Driving Assistance Strategies

    No full text
    © 20xx IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works.International audiencePartially Autonomous Driver's Assistance Systems (PADAS) are systems aiming at providing a safer driving experience to people. Especially, one application of such systems is to assist the drivers in reacting optimally so as to prevent collisions with a leading vehicle. Several means can be used by a PADAS to reach this goal. For instance, warning signals can be sent to the driver or the PADAS can actually modify the speed of the car by braking automatically. An optimal combination of different warning signals together with assistive braking is expected to reduce the probability of collision. How to associate the right combination of PADAS actions to a given situation so as to achieve this aim remains an open problem. In this paper, the use of a statistical machine learning method, namely the reinforcement learning paradigm, is proposed to automatically derive an optimal PADAS action selection strategy from a database of driving experiments. Experimental results conducted on actual car simulators with human drivers show that this method achieves a significant reduction of the risk of collision

    Apprentissage par renforcement et jeux stochastiques à information incomplète

    Get PDF
    Le but de notre travail est de permettre à des agents d'apprendre à coopérer. Chaque agent étant autonome et, forcément, différent des autres, c'est une tâche particulièrement difficile, surtout si les but des deux agents ne sont pas exactement les mêmes. Notre souci est de travailler avec des agents les plus simples possibles, c'est-à-dire plutôt réactifs. Nous proposons alors de doter les agents de capacités limitées de communication pour mettre en place une notion similaire aux “contrats” de la théorie des jeux. Si les agents s'accordent sur cette notion de contrat, notre algorithme leur permet de converger vers des équilibres qui induisent des comportements “plus coopératifs” que le simple équilibre de Nash

    Cooperation in stochastic games through communication

    Get PDF
    http://portal.acm.org/We describe a process of reinforcement learning in two-agent general-sum stochastic games under imperfect observability of moves and payoffs. In practice, it is known that using naive Q-learning, agents can learn equilibrium policies under the discounted reward criterion although these may be arbitrarily worse for both the agents than a non-equilibrium policy, in the absence of global optima. We aim for Pareto-efficiency in policies, in which agents enjoy higher payoffs than in an equilibrium and show agents may employ naive Q-learning with the addition of communication and a payoff interpretation rule, to achieve this. In principle, our objective is to shift the focus of the learning from equilibria (to which solipsistic algorithms converge) to non-equilibria by transforming the latter to equilibria

    Efficient Learning in Games

    Get PDF
    National audienceWe consider the problem of learning strategy selection in games. The theoretical solution to this problem is a distribution over strategies that responds to a Nash equilibrium of the game. When the payoff function of the game is not known to the participants, such a distribution must be approximated directly through repeated play. Full knowledge of the payoff function, on the other hand, restricts agents to be strictly rational. In this classical approach, agents are bound to a Nash equilibrium, even when a globally better solution is obtainable. In this paper, we present an algorithm that allows agents to capitalize on their very lack of information about the payoff structure. The principle we propose is that agents resort to the manipulation of their own payoffs, during the course of learning, to find a ``game'' that gives them a higher payoff than when no manipulation occurs. In essence, the payoffs are considered an extension of the strategy set. At all times, agents remain rational vis-Ă -vis the information available. In self-play, the algorithm affords a globally efficient payoff (if it exists)

    Une méthode de programmation linéaire mixte pour les POMDP décentralisé à horizon fini

    Get PDF
    National audienceNous nous intéressons au problème consistant à trouver une politique jointe optimale pour nn agents dans le cadre du contrôle optimal d'un processus décisionnel de Markov décentralisé partiellement observé (Dec-POMDP). Le principe de notre approche est le suivant~: la politique jointe optimale d'un Dec-POMDP est équivalente à une politique \emph{sous-optimale} du POMDP lié, politique qui devrait en outre respecter des contraintes structurelles afin qu'elle puisse être décentralisée. En s'appuyant sur ce principe, nous présentons un algorithme exact qui utilise la programmation linéaire mixte (PLM) pour trouver un vecteur de poids de réalisation de séquences jointes (suite d'actions et d'observations jointes) qui représente ainsi une politique jointe. La politique jointe (décentralisable) optimale pour le Dec-POMDP dérive directement de la solution de ce PLM. Des expérimentation de notre algorithme sur des problèmes de Dec-POMDP standards montrent qu'il est plus efficace (rapide) que les algorithmes exacts actuels de programmation dynamique

    Stigmergy in multi-agent reinforcement learning

    Get PDF
    http://www.computer.orgIn this paper, we describe how certain aspects of the biological phenomena of stigmergy can be imported into multi-agent reinforcement learning (MARL), with the purpose of better enabling coordination of agent actions and speeding up learning. In particular, we detail how these stigmergic aspects can be used to define an inter-agent communication framework

    Cooperation through communication in decentralized Markov games

    Get PDF
    In this paper, we present a comunication-integrated reinforcement-learning algorithm for a general-sum Markov game or MG played by independent, cooperative agents. The algorithm assumes that agents can communicate but do not know the purpose (the semantic) of doing so. We model agents that have different tasks, some of which may be commonly beneficial. The objective of the agents is to determine which are the commonly beneficial tasks, and learn a sequence of actions that achieves the common tasks. In other words, the agents play a multi-stage coordination game, of which they know niether the stage-wise payoff matrix nor the stage transition matrix. Our principal interest is in imposing realistic conditions of learning on the agents. Towards this end, we assume that they operate in a strictly imperfect monitoring setting wherein they do not observe one another's actions or rewards. A learning algorithm for a Markov game under this stricter condition of learning has not been proposed yet to our knowledge. We describe this Markov game with individual reward functions as a new formalism, decentralized Markov game or Dec-MG, a formalism borrowed from Dec-MDP (Markov decison process). For the communicatory aspect of the learning conditions, we propose a series of communication frameworks graduated in terms of facilitation of information exchange amongst the agents. We present results of testing our algorithm in a toy problem MG called a total guessing game

    Using linear programming duality for solving finite horizon Dec-POMDPs

    Get PDF
    This paper studies the problem of finding an optimal finite horizon joint policy for a decentralized partially observable Markov decision process (Dec-POMDP). We present a new algorithm for finding an optimal joint policy. The algorithm is based on the fact that the necessary condition for a joint policy to be optimal is that it be locally optimal (that is, a Nash equilibrium). Through the application of linear programming duality, the necessary condition can be transformed to a nonlinear program which can then further be transformed to a 0-1 mixed integer linear program (MILP) whose optimal solution is an optimal joint policy (in the sequence form). The proposed algorithm thus consists of solving this 0-1 MILP. Computational experience of the 0-1 MILP on two and three agent DEC-POMDPs gives mixed results. On some problems it is faster than existing algorithms, on others it is slower
    corecore