864 research outputs found
Solving Imperfect Information Games Using Decomposition
Decomposition, i.e. independently analyzing possible subgames, has proven to
be an essential principle for effective decision-making in perfect information
games. However, in imperfect information games, decomposition has proven to be
problematic. To date, all proposed techniques for decomposition in imperfect
information games have abandoned theoretical guarantees. This work presents the
first technique for decomposing an imperfect information game into subgames
that can be solved independently, while retaining optimality guarantees on the
full-game solution. We can use this technique to construct theoretically
justified algorithms that make better use of information available at run-time,
overcome memory or disk limitations at run-time, or make a time/space trade-off
to overcome memory or disk limitations while solving a game. In particular, we
present an algorithm for subgame solving which guarantees performance in the
whole game, in contrast to existing methods which may have unbounded error. In
addition, we present an offline game solving algorithm, CFR-D, which can
produce a Nash equilibrium for a game that is larger than available storage.Comment: 7 pages by 2 columns, 5 figures; April 21 2014 - expand explanations
and theor
Solving Large Extensive-Form Games with Strategy Constraints
Extensive-form games are a common model for multiagent interactions with
imperfect information. In two-player zero-sum games, the typical solution
concept is a Nash equilibrium over the unconstrained strategy set for each
player. In many situations, however, we would like to constrain the set of
possible strategies. For example, constraints are a natural way to model
limited resources, risk mitigation, safety, consistency with past observations
of behavior, or other secondary objectives for an agent. In small games,
optimal strategies under linear constraints can be found by solving a linear
program; however, state-of-the-art algorithms for solving large games cannot
handle general constraints. In this work we introduce a generalized form of
Counterfactual Regret Minimization that provably finds optimal strategies under
any feasible set of convex constraints. We demonstrate the effectiveness of our
algorithm for finding strategies that mitigate risk in security games, and for
opponent modeling in poker games when given only partial observations of
private information.Comment: Appeared in AAAI 201
On Local Regret
Online learning aims to perform nearly as well as the best hypothesis in
hindsight. For some hypothesis classes, though, even finding the best
hypothesis offline is challenging. In such offline cases, local search
techniques are often employed and only local optimality guaranteed. For online
decision-making with such hypothesis classes, we introduce local regret, a
generalization of regret that aims to perform nearly as well as only nearby
hypotheses. We then present a general algorithm to minimize local regret with
arbitrary locality graphs. We also show how the graph structure can be
exploited to drastically speed learning. These algorithms are then demonstrated
on a diverse set of online problems: online disjunct learning, online Max-SAT,
and online decision tree learning.Comment: This is the longer version of the same-titled paper appearing in the
Proceedings of the Twenty-Ninth International Conference on Machine Learning
(ICML), 201
Count-Based Exploration with the Successor Representation
In this paper we introduce a simple approach for exploration in reinforcement
learning (RL) that allows us to develop theoretically justified algorithms in
the tabular case but that is also extendable to settings where function
approximation is required. Our approach is based on the successor
representation (SR), which was originally introduced as a representation
defining state generalization by the similarity of successor states. Here we
show that the norm of the SR, while it is being learned, can be used as a
reward bonus to incentivize exploration. In order to better understand this
transient behavior of the norm of the SR we introduce the substochastic
successor representation (SSR) and we show that it implicitly counts the number
of times each state (or feature) has been observed. We use this result to
introduce an algorithm that performs as well as some theoretically
sample-efficient approaches. Finally, we extend these ideas to a deep RL
algorithm and show that it achieves state-of-the-art performance in Atari 2600
games when in a low sample-complexity regime.Comment: This paper appears in the Proceedings of the 34th AAAI Conference on
Artificial Intelligence (AAAI 2020
No-Regret Learning in Extensive-Form Games with Imperfect Recall
Counterfactual Regret Minimization (CFR) is an efficient no-regret learning
algorithm for decision problems modeled as extensive games. CFR's regret bounds
depend on the requirement of perfect recall: players always remember
information that was revealed to them and the order in which it was revealed.
In games without perfect recall, however, CFR's guarantees do not apply. In
this paper, we present the first regret bound for CFR when applied to a general
class of games with imperfect recall. In addition, we show that CFR applied to
any abstraction belonging to our general class results in a regret bound not
just for the abstract game, but for the full game as well. We verify our theory
and show how imperfect recall can be used to trade a small increase in regret
for a significant reduction in memory in three domains: die-roll poker, phantom
tic-tac-toe, and Bluff.Comment: 21 pages, 4 figures, expanded version of article to appear in
Proceedings of the Twenty-Ninth International Conference on Machine Learnin
- …
