8 research outputs found
Extensive-Form Game Solving via Blackwell Approachability on Treeplexes
In this paper, we introduce the first algorithmic framework for Blackwell
approachability on the sequence-form polytope, the class of convex polytopes
capturing the strategies of players in extensive-form games (EFGs). This leads
to a new class of regret-minimization algorithms that are stepsize-invariant,
in the same sense as the Regret Matching and Regret Matching algorithms for
the simplex. Our modular framework can be combined with any existing regret
minimizer over cones to compute a Nash equilibrium in two-player zero-sum EFGs
with perfect recall, through the self-play framework. Leveraging predictive
online mirror descent, we introduce Predictive Treeplex Blackwell
(PTB), and show a convergence rate to Nash equilibrium in
self-play. We then show how to stabilize PTB with a stepsize, resulting in
an algorithm with a state-of-the-art convergence rate. We provide an
extensive set of experiments to compare our framework with several algorithmic
benchmarks, including CFR and its predictive variant, and we highlight
interesting connections between practical performance and the
stepsize-dependence or stepsize-invariance properties of classical algorithms
Fairness, Semi-Supervised Learning, and More: A General Framework for Clustering with Stochastic Pairwise Constraints
Metric clustering is fundamental in areas ranging from Combinatorial
Optimization and Data Mining, to Machine Learning and Operations Research.
However, in a variety of situations we may have additional requirements or
knowledge, distinct from the underlying metric, regarding which pairs of points
should be clustered together. To capture and analyze such scenarios, we
introduce a novel family of \emph{stochastic pairwise constraints}, which we
incorporate into several essential clustering objectives (radius/median/means).
Moreover, we demonstrate that these constraints can succinctly model an
intriguing collection of applications, including among others \emph{Individual
Fairness} in clustering and \emph{Must-link} constraints in semi-supervised
learning. Our main result consists of a general framework that yields
approximation algorithms with provable guarantees for important clustering
objectives, while at the same time producing solutions that respect the
stochastic pairwise constraints. Furthermore, for certain objectives we devise
improved results in the case of Must-link constraints, which are also the best
possible from a theoretical perspective. Finally, we present experimental
evidence that validates the effectiveness of our algorithms.Comment: This paper appeared in AAAI 202
Learning Stackelberg Equilibria and Applications to Economic Design Games
We study the use of reinforcement learning to learn the optimal leader's
strategy in Stackelberg games. Learning a leader's strategy has an innate
stationarity problem -- when optimizing the leader's strategy, the followers'
strategies might shift. To circumvent this problem, we model the followers via
no-regret dynamics to converge to a Bayesian Coarse-Correlated Equilibrium
(B-CCE) of the game induced by the leader. We then embed the followers'
no-regret dynamics in the leader's learning environment, which allows us to
formulate our learning problem as a standard POMDP. We prove that the optimal
policy of this POMDP achieves the same utility as the optimal leader's strategy
in our Stackelberg game. We solve this POMDP using actor-critic methods, where
the critic is given access to the joint information of all the agents. Finally,
we show that our methods are able to learn optimal leader strategies in a
variety of settings of increasing complexity, including indirect mechanisms
where the leader's strategy is setting up the mechanism's rules
Efficient Learning in Polyhedral Games via Best-Response Oracles
We study online learning and equilibrium computation in games with polyhedral decision sets, a property shared by normal-form games (NFGs) and extensive-form games (EFGs), when the learning agent is restricted to utilizing a best-response oracle. We show how to achieve constant regret in zero-sum games and O(T^0.25) regret in general-sum games while using only O(log t) best-response queries at a given iteration t, thus improving over the best prior result, which required O(T) queries per iteration. Moreover, our framework yields the first last-iterate convergence guarantees for self-play with best-response oracles in zero-sum games. This convergence occurs at a linear rate, though with a condition-number dependence. We go on to show a O(T^(-0.5)) best-iterate convergence rate without such a dependence. Our results build on linear-rate convergence results for variants of the Frank-Wolfe (FW) algorithm for strongly convex and smooth minimization problems over polyhedral domains. These FW results depend on a condition number of the polytope, known as facial distance. In order to enable application to settings such as EFGs, we show two broad new results: 1) the facial distance for polytopes in standard form is at least γ/k where γ is the minimum value of a nonzero coordinate of a vertex of the polytope and k≤n is the number of tight inequality constraints in the optimal face, and 2) the facial distance for polytopes of the form Ax=b, Cx≤d, x≥0 where x∈R^n, C≥0 is a nonzero integral matrix, and d≥0, is at least 1/(c√n), where c is the infinity norm of C. This yields the first such results for several problems such as sequence-form polytopes, flow polytopes, and matching polytopes
Automated Design of Affine Maximizer Mechanisms in Dynamic Settings
Dynamic mechanism design is a challenging extension to ordinary mechanism design in which the mechanism designer must make a sequence of decisions over time in the face of possibly untruthful reports of participating agents.
Optimizing dynamic mechanisms for welfare is relatively well understood. However, there has been less work on optimizing for other goals (e.g., revenue), and without restrictive assumptions on valuations, it is remarkably challenging to characterize good mechanisms. Instead, we turn to automated mechanism design to find mechanisms with good performance in specific problem instances.
We extend the class of affine maximizer mechanisms to MDPs where agents may untruthfully report their rewards. This extension results in a challenging bilevel optimization problem in which the upper problem involves choosing optimal mechanism parameters, and the lower problem involves solving the resulting MDP.
Our approach can find truthful dynamic mechanisms that achieve strong performance on goals other than welfare, and can be applied to essentially any problem setting---without restrictions on valuations---for which RL can learn optimal policies