729 research outputs found

    Last Round Convergence and No-Instant Regret in Repeated Games with Asymmetric Information

    Full text link
    This paper considers repeated games in which one player has more information about the game than the other players. In particular, we investigate repeated two-player zero-sum games where only the column player knows the payoff matrix A of the game. Suppose that while repeatedly playing this game, the row player chooses her strategy at each round by using a no-regret algorithm to minimize her (pseudo) regret. We develop a no-instant-regret algorithm for the column player to exhibit last round convergence to a minimax equilibrium. We show that our algorithm is efficient against a large set of popular no-regret algorithms of the row player, including the multiplicative weight update algorithm, the online mirror descent method/follow-the-regularized-leader, the linear multiplicative weight update algorithm, and the optimistic multiplicative weight update

    Exploiting No-Regret Algorithms in System Design

    Full text link
    We investigate a repeated two-player zero-sum game setting where the column player is also a designer of the system, and has full control on the design of the payoff matrix. In addition, the row player uses a no-regret algorithm to efficiently learn how to adapt their strategy to the column player's behaviour over time in order to achieve good total payoff. The goal of the column player is to guide her opponent to pick a mixed strategy which is favourable for the system designer. Therefore, she needs to: (i) design an appropriate payoff matrix AA whose unique minimax solution contains the desired mixed strategy of the row player; and (ii) strategically interact with the row player during a sequence of plays in order to guide her opponent to converge to that desired behaviour. To design such a payoff matrix, we propose a novel solution that provably has a unique minimax solution with the desired behaviour. We also investigate a relaxation of this problem where uniqueness is not required, but all the minimax solutions have the same mixed strategy for the row player. Finally, we propose a new game playing algorithm for the system designer and prove that it can guide the row player, who may play a \emph{stable} no-regret algorithm, to converge to a minimax solution

    Online Double Oracle

    Full text link
    Solving strategic games with huge action space is a critical yet under-explored topic in economics, operations research and artificial intelligence. This paper proposes new learning algorithms for solving two-player zero-sum normal-form games where the number of pure strategies is prohibitively large. Specifically, we combine no-regret analysis from online learning with Double Oracle (DO) methods from game theory. Our method -- \emph{Online Double Oracle (ODO)} -- is provably convergent to a Nash equilibrium (NE). Most importantly, unlike normal DO methods, ODO is \emph{rationale} in the sense that each agent in ODO can exploit strategic adversary with a regret bound of O(Tklog(k))\mathcal{O}(\sqrt{T k \log(k)}) where kk is not the total number of pure strategies, but rather the size of \emph{effective strategy set} that is linearly dependent on the support size of the NE. On tens of different real-world games, ODO outperforms DO, PSRO methods, and no-regret algorithms such as Multiplicative Weight Update by a significant margin, both in terms of convergence rate to a NE and average payoff against strategic adversaries.Comment: [email protected]

    Decentralized Learning in Online Queuing Systems

    Full text link
    Motivated by packet routing in computer networks, online queuing systems are composed of queues receiving packets at different rates. Repeatedly, they send packets to servers, each of them treating only at most one packet at a time. In the centralized case, the number of accumulated packets remains bounded (i.e., the system is \textit{stable}) as long as the ratio between service rates and arrival rates is larger than 11. In the decentralized case, individual no-regret strategies ensures stability when this ratio is larger than 22. Yet, myopically minimizing regret disregards the long term effects due to the carryover of packets to further rounds. On the other hand, minimizing long term costs leads to stable Nash equilibria as soon as the ratio exceeds ee1\frac{e}{e-1}. Stability with decentralized learning strategies with a ratio below 22 was a major remaining question. We first argue that for ratios up to 22, cooperation is required for stability of learning strategies, as selfish minimization of policy regret, a \textit{patient} notion of regret, might indeed still be unstable in this case. We therefore consider cooperative queues and propose the first learning decentralized algorithm guaranteeing stability of the system as long as the ratio of rates is larger than 11, thus reaching performances comparable to centralized strategies.Comment: NeurIPS 2021 camera read

    A Free Exchange e-Marketplace for Digital Services

    Get PDF
    The digital era is witnessing a remarkable evolution of digital services. While the prospects are countless, the e-marketplaces of digital services are encountering inherent game-theoretic and computational challenges that restrict the rational choices of bidders. Our work examines the limited bidding scope and the inefficiencies of present exchange e-marketplaces. To meet challenges, a free exchange e-marketplace is proposed that follows the free market economy. The free exchange model includes a new bidding language and a double auction mechanism. The rule-based bidding language enables the flexible expression of preferences and strategic conduct. The bidding message holds the attribute-valuations and bidding rules of the selected services. The free exchange deliberates on attributes and logical bidding rules for automatic deduction and formation of elicited services and bids that result in a more rapid self-managed multiple exchange trades. The double auction uses forward and reverse generalized second price auctions for the symmetric matching of multiple digital services of identical attributes and different quality levels. The proposed double auction uses tractable heuristics that secure exchange profitability, improve truthful bidding and deliver stable social efficiency. While the strongest properties of symmetric exchanges are unfeasible game-theoretically, the free exchange converges rapidly to the social efficiency, Nash truthful stability, and weak budget balance by multiple quality-levels cross-matching, constant learning and informs at repetitive thick trades. The empirical findings validate the soundness and viability of the free exchange

    Regret-Minimization Algorithms for Multi-Agent Cooperative Learning Systems

    Full text link
    A Multi-Agent Cooperative Learning (MACL) system is an artificial intelligence (AI) system where multiple learning agents work together to complete a common task. Recent empirical success of MACL systems in various domains (e.g. traffic control, cloud computing, robotics) has sparked active research into the design and analysis of MACL systems for sequential decision making problems. One important metric of the learning algorithm for decision making problems is its regret, i.e. the difference between the highest achievable reward and the actual reward that the algorithm gains. The design and development of a MACL system with low-regret learning algorithms can create huge economic values. In this thesis, I analyze MACL systems for different sequential decision making problems. Concretely, the Chapter 3 and 4 investigate the cooperative multi-agent multi-armed bandit problems, with full-information or bandit feedback, in which multiple learning agents can exchange their information through a communication network and the agents can only observe the rewards of the actions they choose. Chapter 5 considers the communication-regret trade-off for online convex optimization in the distributed setting. Chapter 6 discusses how to form high-productive teams for agents based on their unknown but fixed types using adaptive incremental matchings. For the above problems, I present the regret lower bounds for feasible learning algorithms and provide the efficient algorithms to achieve this bound. The regret bounds I present in Chapter 3, 4 and 5 quantify how the regret depends on the connectivity of the communication network and the communication delay, thus giving useful guidance on design of the communication protocol in MACL systemsComment: Thesis submitted to London School of Economics and Political Science for PhD in Statistic

    Recent Advances in Experimental Studies of Social Dilemma Games

    Get PDF

    Sunk cost accounting and entrapment in corporate acquisitions and financial markets : an experimental analysis

    Get PDF
    Sunk cost accounting refers to the empirical finding that individuals tend to let their decisions be influenced by costs made at an earlier time in such a way that they are more risk seeking than they would be had they not incurred these costs. Such behaviour violates the axioms of economic theory which states individuals should only consider incremental costs and benefits when executing investments. This dissertation is concerned whether the pervasive sunk cost phenomenon extends to corporate acquisitions and financial markets. 122 students from the University of St Andrews participated in three experiments exploring the use of sunk costs in interactive negotiation contexts and financial markets. Experiment I elucidates that subjects value the sunk cost issue higher than other issues in a multi-issue negotiation. Experiment II illustrates that bidders are influenced by the sunk costs of competing bidders in a first price, sealed-bid, common-value auction. In financial markets their exists an analogous concept to sunk cost accounting known as the disposition effect. This explains the tendency of investors to sell “winning” stocks and hold “losing” stocks. Experiment III demonstrates that trading strategies in an experimental equity market are influenced by a pre-trading brokerage cost. Not only are subjects influenced in the direction that reduces the disposition effect but also trading is diminished. Without the brokerage cost there was a significant disposition effect. JEL-Classifications C70, C90, D44, D80, D81, G1
    corecore