3,881 research outputs found
Imperfect-Recall Abstractions with Bounds in Games
Imperfect-recall abstraction has emerged as the leading paradigm for
practical large-scale equilibrium computation in incomplete-information games.
However, imperfect-recall abstractions are poorly understood, and only weak
algorithm-specific guarantees on solution quality are known. In this paper, we
show the first general, algorithm-agnostic, solution quality guarantees for
Nash equilibria and approximate self-trembling equilibria computed in
imperfect-recall abstractions, when implemented in the original
(perfect-recall) game. Our results are for a class of games that generalizes
the only previously known class of imperfect-recall abstractions where any
results had been obtained. Further, our analysis is tighter in two ways, each
of which can lead to an exponential reduction in the solution quality error
bound.
We then show that for extensive-form games that satisfy certain properties,
the problem of computing a bound-minimizing abstraction for a single level of
the game reduces to a clustering problem, where the increase in our bound is
the distance function. This reduction leads to the first imperfect-recall
abstraction algorithm with solution quality bounds. We proceed to show a divide
in the class of abstraction problems. If payoffs are at the same scale at all
information sets considered for abstraction, the input forms a metric space.
Conversely, if this condition is not satisfied, we show that the input does not
form a metric space. Finally, we use these results to experimentally
investigate the quality of our bound for single-level abstraction
Solving Games with Functional Regret Estimation
We propose a novel online learning method for minimizing regret in large
extensive-form games. The approach learns a function approximator online to
estimate the regret for choosing a particular action. A no-regret algorithm
uses these estimates in place of the true regrets to define a sequence of
policies.
We prove the approach sound by providing a bound relating the quality of the
function approximation and regret of the algorithm. A corollary being that the
method is guaranteed to converge to a Nash equilibrium in self-play so long as
the regrets are ultimately realizable by the function approximator. Our
technique can be understood as a principled generalization of existing work on
abstraction in large games; in our work, both the abstraction as well as the
equilibrium are learned during self-play. We demonstrate empirically the method
achieves higher quality strategies than state-of-the-art abstraction techniques
given the same resources.Comment: AAAI Conference on Artificial Intelligence 201
No-Regret Learning in Extensive-Form Games with Imperfect Recall
Counterfactual Regret Minimization (CFR) is an efficient no-regret learning
algorithm for decision problems modeled as extensive games. CFR's regret bounds
depend on the requirement of perfect recall: players always remember
information that was revealed to them and the order in which it was revealed.
In games without perfect recall, however, CFR's guarantees do not apply. In
this paper, we present the first regret bound for CFR when applied to a general
class of games with imperfect recall. In addition, we show that CFR applied to
any abstraction belonging to our general class results in a regret bound not
just for the abstract game, but for the full game as well. We verify our theory
and show how imperfect recall can be used to trade a small increase in regret
for a significant reduction in memory in three domains: die-roll poker, phantom
tic-tac-toe, and Bluff.Comment: 21 pages, 4 figures, expanded version of article to appear in
Proceedings of the Twenty-Ninth International Conference on Machine Learnin
Solving Imperfect Information Games Using Decomposition
Decomposition, i.e. independently analyzing possible subgames, has proven to
be an essential principle for effective decision-making in perfect information
games. However, in imperfect information games, decomposition has proven to be
problematic. To date, all proposed techniques for decomposition in imperfect
information games have abandoned theoretical guarantees. This work presents the
first technique for decomposing an imperfect information game into subgames
that can be solved independently, while retaining optimality guarantees on the
full-game solution. We can use this technique to construct theoretically
justified algorithms that make better use of information available at run-time,
overcome memory or disk limitations at run-time, or make a time/space trade-off
to overcome memory or disk limitations while solving a game. In particular, we
present an algorithm for subgame solving which guarantees performance in the
whole game, in contrast to existing methods which may have unbounded error. In
addition, we present an offline game solving algorithm, CFR-D, which can
produce a Nash equilibrium for a game that is larger than available storage.Comment: 7 pages by 2 columns, 5 figures; April 21 2014 - expand explanations
and theor
Learning in Real-Time Search: A Unifying Framework
Real-time search methods are suited for tasks in which the agent is
interacting with an initially unknown environment in real time. In such
simultaneous planning and learning problems, the agent has to select its
actions in a limited amount of time, while sensing only a local part of the
environment centered at the agents current location. Real-time heuristic search
agents select actions using a limited lookahead search and evaluating the
frontier states with a heuristic function. Over repeated experiences, they
refine heuristic values of states to avoid infinite loops and to converge to
better solutions. The wide spread of such settings in autonomous software and
hardware agents has led to an explosion of real-time search algorithms over the
last two decades. Not only is a potential user confronted with a hodgepodge of
algorithms, but he also faces the choice of control parameters they use. In
this paper we address both problems. The first contribution is an introduction
of a simple three-parameter framework (named LRTS) which extracts the core
ideas behind many existing algorithms. We then prove that LRTA*, epsilon-LRTA*,
SLA*, and gamma-Trap algorithms are special cases of our framework. Thus, they
are unified and extended with additional features. Second, we prove
completeness and convergence of any algorithm covered by the LRTS framework.
Third, we prove several upper-bounds relating the control parameters and
solution quality. Finally, we analyze the influence of the three control
parameters empirically in the realistic scalable domains of real-time
navigation on initially unknown maps from a commercial role-playing game as
well as routing in ad hoc sensor networks
Solving Large Extensive-Form Games with Strategy Constraints
Extensive-form games are a common model for multiagent interactions with
imperfect information. In two-player zero-sum games, the typical solution
concept is a Nash equilibrium over the unconstrained strategy set for each
player. In many situations, however, we would like to constrain the set of
possible strategies. For example, constraints are a natural way to model
limited resources, risk mitigation, safety, consistency with past observations
of behavior, or other secondary objectives for an agent. In small games,
optimal strategies under linear constraints can be found by solving a linear
program; however, state-of-the-art algorithms for solving large games cannot
handle general constraints. In this work we introduce a generalized form of
Counterfactual Regret Minimization that provably finds optimal strategies under
any feasible set of convex constraints. We demonstrate the effectiveness of our
algorithm for finding strategies that mitigate risk in security games, and for
opponent modeling in poker games when given only partial observations of
private information.Comment: Appeared in AAAI 201
- …