3 research outputs found
Is Learning in Games Good for the Learners?
We consider a number of questions related to tradeoffs between reward and
regret in repeated gameplay between two agents. To facilitate this, we
introduce a notion of which allows for
asymmetric regret constraints, and yields polytopes of feasible values for each
agent and pair of regret constraints, where we show that any such equilibrium
is reachable by a pair of algorithms which maintain their regret guarantees
against arbitrary opponents. As a central example, we highlight the case one
agent is no-swap and the other's regret is unconstrained. We show that this
captures an extension of equilibria with a matching
optimal value, and that there exists a wide class of games where a player can
significantly increase their utility by deviating from a no-swap-regret
algorithm against a no-swap learner (in fact, almost any game without pure Nash
equilibria is of this form). Additionally, we make use of generalized
equilibria to consider tradeoffs in terms of the opponent's algorithm choice.
We give a tight characterization for the maximal reward obtainable against
no-regret learner, yet we also show a class of games in which
this is bounded away from the value obtainable against the class of common
"mean-based" no-regret algorithms. Finally, we consider the question of
learning reward-optimal strategies via repeated play with a no-regret agent
when the game is initially unknown. Again we show tradeoffs depending on the
opponent's learning algorithm: the Stackelberg strategy is learnable in
exponential time with any no-regret agent (and in polynomial time with any
no--regret agent) for any game where it is learnable via
queries, and there are games where it is learnable in polynomial time against
any no-swap-regret agent but requires exponential time against a mean-based
no-regret agent.Comment: 22 page