189 research outputs found
Solving Large Extensive-Form Games with Strategy Constraints
Extensive-form games are a common model for multiagent interactions with
imperfect information. In two-player zero-sum games, the typical solution
concept is a Nash equilibrium over the unconstrained strategy set for each
player. In many situations, however, we would like to constrain the set of
possible strategies. For example, constraints are a natural way to model
limited resources, risk mitigation, safety, consistency with past observations
of behavior, or other secondary objectives for an agent. In small games,
optimal strategies under linear constraints can be found by solving a linear
program; however, state-of-the-art algorithms for solving large games cannot
handle general constraints. In this work we introduce a generalized form of
Counterfactual Regret Minimization that provably finds optimal strategies under
any feasible set of convex constraints. We demonstrate the effectiveness of our
algorithm for finding strategies that mitigate risk in security games, and for
opponent modeling in poker games when given only partial observations of
private information.Comment: Appeared in AAAI 201
Imperfect-Recall Abstractions with Bounds in Games
Imperfect-recall abstraction has emerged as the leading paradigm for
practical large-scale equilibrium computation in incomplete-information games.
However, imperfect-recall abstractions are poorly understood, and only weak
algorithm-specific guarantees on solution quality are known. In this paper, we
show the first general, algorithm-agnostic, solution quality guarantees for
Nash equilibria and approximate self-trembling equilibria computed in
imperfect-recall abstractions, when implemented in the original
(perfect-recall) game. Our results are for a class of games that generalizes
the only previously known class of imperfect-recall abstractions where any
results had been obtained. Further, our analysis is tighter in two ways, each
of which can lead to an exponential reduction in the solution quality error
bound.
We then show that for extensive-form games that satisfy certain properties,
the problem of computing a bound-minimizing abstraction for a single level of
the game reduces to a clustering problem, where the increase in our bound is
the distance function. This reduction leads to the first imperfect-recall
abstraction algorithm with solution quality bounds. We proceed to show a divide
in the class of abstraction problems. If payoffs are at the same scale at all
information sets considered for abstraction, the input forms a metric space.
Conversely, if this condition is not satisfied, we show that the input does not
form a metric space. Finally, we use these results to experimentally
investigate the quality of our bound for single-level abstraction
Regret-Minimizing Double Oracle for Extensive-Form Games
By incorporating regret minimization, double oracle methods have demonstrated
rapid convergence to Nash Equilibrium (NE) in normal-form games and
extensive-form games, through algorithms such as online double oracle (ODO) and
extensive-form double oracle (XDO), respectively. In this study, we further
examine the theoretical convergence rate and sample complexity of such regret
minimization-based double oracle methods, utilizing a unified framework called
Regret-Minimizing Double Oracle. Based on this framework, we extend ODO to
extensive-form games and determine its sample complexity. Moreover, we
demonstrate that the sample complexity of XDO can be exponential in the number
of information sets , owing to the exponentially decaying stopping
threshold of restricted games. To solve this problem, we propose the Periodic
Double Oracle (PDO) method, which has the lowest sample complexity among regret
minimization-based double oracle methods, being only polynomial in .
Empirical evaluations on multiple poker and board games show that PDO achieves
significantly faster convergence than previous double oracle algorithms and
reaches a competitive level with state-of-the-art regret minimization methods.Comment: Accepted at ICML, 202
Operational Decision Making under Uncertainty: Inferential, Sequential, and Adversarial Approaches
Modern security threats are characterized by a stochastic, dynamic, partially observable, and ambiguous operational environment. This dissertation addresses such complex security threats using operations research techniques for decision making under uncertainty in operations planning, analysis, and assessment. First, this research develops a new method for robust queue inference with partially observable, stochastic arrival and departure times, motivated by cybersecurity and terrorism applications. In the dynamic setting, this work develops a new variant of Markov decision processes and an algorithm for robust information collection in dynamic, partially observable and ambiguous environments, with an application to a cybersecurity detection problem. In the adversarial setting, this work presents a new application of counterfactual regret minimization and robust optimization to a multi-domain cyber and air defense problem in a partially observable environment
Machine learning applied to the context of Poker
A combinação de princÃpios da teoria de jogo e metodologias de machine learning aplicados ao contexto de formular estratégias ótimas para jogos está a angariar interesse por parte de uma porção crescentemente significativa da comunidade cientÃfica, tornando-se o jogo do Poker num candidato de estudo popular devido à sua natureza de informação imperfeita. Avanços nesta área possuem vastas aplicações em cenários do mundo real, e a área de investigação de inteligência artificial demonstra que o interesse relativo a este objeto de estudo está longe de desaparecer, com investigadores do Facebook e Carnegie Mellon a apresentar, em 2019, o primeiro agente de jogo autónomo de Poker provado como ganhador num cenário com múltiplos jogadores, uma conquista relativamente à anterior especificação do estado da arte, que fora desenvolvida para jogos de apenas 2 jogadores. Este estudo pretende explorar as caracterÃsticas de jogos estocásticos de informação imperfeita, recolhendo informação acerca dos avanços nas metodologias disponibilizados por parte de investigadores de forma a desenvolver um agente autónomo de jogo que se pretende inserir na classificação de "utility-maximizing decision-maker".The combination of game theory principles and machine learning methodologies applied to encountering optimal strategies for games is garnering interest from an increasing large portion of the scientific community, with the game of Poker being a popular study subject due to its imperfect information nature. Advancements in this area have a wide array of applications in real-world scenarios, and the field of artificial intelligent studies show that the interest regarding this object of study is yet to fade, with researchers from Facebook and Carnegie Mellon presenting, in 2019, the world’s first autonomous Poker playing agent that is proven to be profitable while confronting multiple players at a time, an achievement in relation to the previous state of the art specification, which was developed for two player games only. This study intends to explore the characteristics of stochastic games of imperfect information, gathering information regarding the advancements in methodologies made available by researchers in order to ultimately develop an autonomous agent intended to adhere to the classification of a utility-maximizing decision-maker
- …