161 research outputs found
LIPIcs, Volume 251, ITCS 2023, Complete Volume
LIPIcs, Volume 251, ITCS 2023, Complete Volum
CFR-p: Counterfactual Regret Minimization with Hierarchical Policy Abstraction, and its Application to Two-player Mahjong
Counterfactual Regret Minimization(CFR) has shown its success in Texas
Hold'em poker. We apply this algorithm to another popular incomplete
information game, Mahjong. Compared to the poker game, Mahjong is much more
complex with many variants. We study two-player Mahjong by conducting game
theoretical analysis and making a hierarchical abstraction to CFR based on
winning policies. This framework can be generalized to other imperfect
information games.Comment: 8 page
LIPIcs, Volume 261, ICALP 2023, Complete Volume
LIPIcs, Volume 261, ICALP 2023, Complete Volum
Safe Opponent Exploitation For Epsilon Equilibrium Strategies
In safe opponent exploitation players hope to exploit their opponents'
potentially sub-optimal strategies while guaranteeing at least the value of the
game in expectation for themselves. Safe opponent exploitation algorithms have
been successfully applied to small instances of two-player zero-sum imperfect
information games, where Nash equilibrium strategies are typically known in
advance. Current methods available to compute these strategies are however not
scalable to desirable large domains of imperfect information such as No-Limit
Texas Hold 'em (NLHE) poker, where successful agents rely on game abstractions
in order to compute an equilibrium strategy approximation. This paper will
extend the concept of safe opponent exploitation by introducing prime-safe
opponent exploitation, in which we redefine the value of the game of a player
to be the worst-case payoff their strategy could be susceptible to. This allows
weaker epsilon equilibrium strategies to benefit from utilising a form of
opponent exploitation with our revised value of the game, still allowing for a
practical game-theoretical guaranteed lower-bound. We demonstrate the empirical
advantages of our generalisation when applied to the main safe opponent
exploitation algorithms
Abstracting Imperfect Information Away from Two-Player Zero-Sum Games
In their seminal work, Nayyar et al. (2013) showed that imperfect information
can be abstracted away from common-payoff games by having players publicly
announce their policies as they play. This insight underpins sound solvers and
decision-time planning algorithms for common-payoff games. Unfortunately, a
naive application of the same insight to two-player zero-sum games fails
because Nash equilibria of the game with public policy announcements may not
correspond to Nash equilibria of the original game. As a consequence, existing
sound decision-time planning algorithms require complicated additional
mechanisms that have unappealing properties. The main contribution of this work
is showing that certain regularized equilibria do not possess the
aforementioned non-correspondence problem -- thus, computing them can be
treated as perfect information problems. Because these regularized equilibria
can be made arbitrarily close to Nash equilibria, our result opens the door to
a new perspective on solving two-player zero-sum games and, in particular,
yields a simplified framework for decision-time planning in two-player zero-sum
games, void of the unappealing properties that plague existing decision-time
planning approaches
A Generic Multi-Player Transformation Algorithm for Solving Large-Scale Zero-Sum Extensive-Form Adversarial Team Games
Many recent practical and theoretical breakthroughs focus on adversarial team
multi-player games (ATMGs) in ex ante correlation scenarios. In this setting,
team members are allowed to coordinate their strategies only before the game
starts. Although there existing algorithms for solving extensive-form ATMGs,
the size of the game tree generated by the previous algorithms grows
exponentially with the number of players. Therefore, how to deal with
large-scale zero-sum extensive-form ATMGs problems close to the real world is
still a significant challenge. In this paper, we propose a generic multi-player
transformation algorithm, which can transform any multi-player game tree
satisfying the definition of AMTGs into a 2-player game tree, such that finding
a team-maxmin equilibrium with correlation (TMECor) in large-scale ATMGs can be
transformed into solving NE in 2-player games. To achieve this goal, we first
introduce a new structure named private information pre-branch, which consists
of a temporary chance node and coordinator nodes and aims to make decisions for
all potential private information on behalf of the team members. We also show
theoretically that NE in the transformed 2-player game is equivalent TMECor in
the original multi-player game. This work significantly reduces the growth of
action space and nodes from exponential to constant level. This enables our
work to outperform all the previous state-of-the-art algorithms in finding a
TMECor, with 182.89, 168.47, 694.44, and 233.98 significant improvements in the
different Kuhn Poker and Leduc Poker cases (21K3, 21K4, 21K6 and 21L33). In
addition, this work first practically solves the ATMGs in a 5-player case which
cannot be conducted by existing algorithms.Comment: 9 pages, 5 figures, NIPS 202
Formal Methods for Autonomous Systems
Formal methods refer to rigorous, mathematical approaches to system
development and have played a key role in establishing the correctness of
safety-critical systems. The main building blocks of formal methods are models
and specifications, which are analogous to behaviors and requirements in system
design and give us the means to verify and synthesize system behaviors with
formal guarantees.
This monograph provides a survey of the current state of the art on
applications of formal methods in the autonomous systems domain. We consider
correct-by-construction synthesis under various formulations, including closed
systems, reactive, and probabilistic settings. Beyond synthesizing systems in
known environments, we address the concept of uncertainty and bound the
behavior of systems that employ learning using formal methods. Further, we
examine the synthesis of systems with monitoring, a mitigation technique for
ensuring that once a system deviates from expected behavior, it knows a way of
returning to normalcy. We also show how to overcome some limitations of formal
methods themselves with learning. We conclude with future directions for formal
methods in reinforcement learning, uncertainty, privacy, explainability of
formal methods, and regulation and certification
LIPIcs, Volume 274, ESA 2023, Complete Volume
LIPIcs, Volume 274, ESA 2023, Complete Volum
The Update Equivalence Framework for Decision-Time Planning
The process of revising (or constructing) a policy immediately prior to
execution -- known as decision-time planning -- is key to achieving superhuman
performance in perfect-information settings like chess and Go. A recent line of
work has extended decision-time planning to more general imperfect-information
settings, leading to superhuman performance in poker. However, these methods
requires considering subgames whose sizes grow quickly in the amount of
non-public information, making them unhelpful when the amount of non-public
information is large. Motivated by this issue, we introduce an alternative
framework for decision-time planning that is not based on subgames but rather
on the notion of update equivalence. In this framework, decision-time planning
algorithms simulate updates of synchronous learning algorithms. This framework
enables us to introduce a new family of principled decision-time planning
algorithms that do not rely on public information, opening the door to sound
and effective decision-time planning in settings with large amounts of
non-public information. In experiments, members of this family produce
comparable or superior results compared to state-of-the-art approaches in
Hanabi and improve performance in 3x3 Abrupt Dark Hex and Phantom Tic-Tac-Toe
Game-Theoretic Robust Reinforcement Learning Handles Temporally-Coupled Perturbations
Robust reinforcement learning (RL) seeks to train policies that can perform
well under environment perturbations or adversarial attacks. Existing
approaches typically assume that the space of possible perturbations remains
the same across timesteps. However, in many settings, the space of possible
perturbations at a given timestep depends on past perturbations. We formally
introduce temporally-coupled perturbations, presenting a novel challenge for
existing robust RL methods. To tackle this challenge, we propose GRAD, a novel
game-theoretic approach that treats the temporally-coupled robust RL problem as
a partially-observable two-player zero-sum game. By finding an approximate
equilibrium in this game, GRAD ensures the agent's robustness against
temporally-coupled perturbations. Empirical experiments on a variety of
continuous control tasks demonstrate that our proposed approach exhibits
significant robustness advantages compared to baselines against both standard
and temporally-coupled attacks, in both state and action spaces
- …