390 research outputs found

    Open-ended Learning in Symmetric Zero-sum Games

    Get PDF
    Zero-sum games such as chess and poker are, abstractly, functions that evaluate pairs of agents, for example labeling them `winner' and `loser'. If the game is approximately transitive, then self-play generates sequences of agents of increasing strength. However, nontransitive games, such as rock-paper-scissors, can exhibit strategic cycles, and there is no longer a clear objective -- we want agents to increase in strength, but against whom is unclear. In this paper, we introduce a geometric framework for formulating agent objectives in zero-sum games, in order to construct adaptive sequences of objectives that yield open-ended learning. The framework allows us to reason about population performance in nontransitive games, and enables the development of a new algorithm (rectified Nash response, PSRO_rN) that uses game-theoretic niching to construct diverse populations of effective agents, producing a stronger set of agents than existing algorithms. We apply PSRO_rN to two highly nontransitive resource allocation games and find that PSRO_rN consistently outperforms the existing alternatives.Comment: ICML 2019, final versio

    A new algorithm for approaching Nash equilibrium and Kalai Smoridinsky solution

    Get PDF
    International audienceIn the present paper, a new formulation of Nash games is proposed for solving general multi-objective optimization problems. The main idea of this approach is to split the optimization variables which allow us to determine numerically the strategies between two players. The first player minimizes his cost function using the variables of the first table P, the second player, using the second table Q. The original contribution of this work concerns the construction of the two tables of allocations that lead to a Nash equilibrium on the Pareto front. On the other hand, we search P and Q that lead to a solution which is both a Nash equilibrium and a Kalai Smorodinsky solution. For this, we proposed and tried out successfully two algorithms which calculate P, Q and their associated Nash equilibrium, by using some extension of Normal Boundary Intersection approach (NBI)

    Qualitative Characteristics and Quantitative Measures of Solution's Reliability in Discrete Optimization: Traditional Analytical Approaches, Innovative Computational Methods and Applicability

    Get PDF
    The purpose of this thesis is twofold. The first and major part is devoted to sensitivity analysis of various discrete optimization problems while the second part addresses methods applied for calculating measures of solution stability and solving multicriteria discrete optimization problems. Despite numerous approaches to stability analysis of discrete optimization problems two major directions can be single out: quantitative and qualitative. Qualitative sensitivity analysis is conducted for multicriteria discrete optimization problems with minisum, minimax and minimin partial criteria. The main results obtained here are necessary and sufficient conditions for different stability types of optimal solutions (or a set of optimal solutions) of the considered problems. Within the framework of quantitative direction various measures of solution stability are investigated. A formula for a quantitative characteristic called stability radius is obtained for the generalized equilibrium situation invariant to changes of game parameters in the case of the H¨older metric. Quality of the problem solution can also be described in terms of robustness analysis. In this work the concepts of accuracy and robustness tolerances are presented for a strategic game with a finite number of players where initial coefficients (costs) of linear payoff functions are subject to perturbations. Investigation of stability radius also aims to devise methods for its calculation. A new metaheuristic approach is derived for calculation of stability radius of an optimal solution to the shortest path problem. The main advantage of the developed method is that it can be potentially applicable for calculating stability radii of NP-hard problems. The last chapter of the thesis focuses on deriving innovative methods based on interactive optimization approach for solving multicriteria combinatorial optimization problems. The key idea of the proposed approach is to utilize a parameterized achievement scalarizing function for solution calculation and to direct interactive procedure by changing weighting coefficients of this function. In order to illustrate the introduced ideas a decision making process is simulated for three objective median location problem. The concepts, models, and ideas collected and analyzed in this thesis create a good and relevant grounds for developing more complicated and integrated models of postoptimal analysis and solving the most computationally challenging problems related to it.Siirretty Doriast

    Deception in Game Theory: A Survey and Multiobjective Model

    Get PDF
    Game theory is the study of mathematical models of conflict. It provides tools for analyzing dynamic interactions between multiple agents and (in some cases) across multiple interactions. This thesis contains two scholarly articles. The first article is a survey of game-theoretic models of deception. The survey describes the ways researchers use game theory to measure the practicality of deception, model the mechanisms for performing deception, analyze the outcomes of deception, and respond to, or mitigate the effects of deception. The survey highlights several gaps in the literature. One important gap concerns the benefit-cost-risk trade-off made during deception planning. To address this research gap, the second article introduces a novel approach for modeling these trade-offs. The approach uses a game theoretic model of deception to define a new multiobjective optimization problem called the deception design problem (DDP). Solutions to the DDP provide courses of deceptive action that are efficient in terms of their benefit, cost, and risk to the deceiver. A case study based on the output of an air-to-air combat simulator demonstrates the DDP in a 7 x 7 normal form game. This approach is the first to evaluate benefit, cost, and risk in a single game theoretic model of deception

    Opponent Learning Awareness and Modelling in Multi-Objective Normal Form Games

    Full text link
    Many real-world multi-agent interactions consider multiple distinct criteria, i.e. the payoffs are multi-objective in nature. However, the same multi-objective payoff vector may lead to different utilities for each participant. Therefore, it is essential for an agent to learn about the behaviour of other agents in the system. In this work, we present the first study of the effects of such opponent modelling on multi-objective multi-agent interactions with non-linear utilities. Specifically, we consider two-player multi-objective normal form games with non-linear utility functions under the scalarised expected returns optimisation criterion. We contribute novel actor-critic and policy gradient formulations to allow reinforcement learning of mixed strategies in this setting, along with extensions that incorporate opponent policy reconstruction and learning with opponent learning awareness (i.e., learning while considering the impact of one's policy when anticipating the opponent's learning step). Empirical results in five different MONFGs demonstrate that opponent learning awareness and modelling can drastically alter the learning dynamics in this setting. When equilibria are present, opponent modelling can confer significant benefits on agents that implement it. When there are no Nash equilibria, opponent learning awareness and modelling allows agents to still converge to meaningful solutions that approximate equilibria.Comment: Under review since 14 November 202

    Game theory approach to competitive economic dynamics

    Get PDF
    This thesis deals both with non-cooperative and cooperative games in order to apply the mathematical theory to competitive dynamics arising from economics, particularly quantity competition in oligopolies and pollution reduction models in IEA (International Environmental Agreements)
    corecore