Near-Optimal Pure Exploration in Matrix Games: A Generalization of
  Stochastic Bandits & Dueling Bandits

Boczar, Ross; Jamieson, Kevin; Maiti, Arnab; Ratliff, Lillian J.

Near-Optimal Pure Exploration in Matrix Games: A Generalization of Stochastic Bandits & Dueling Bandits

Authors: Ross Boczar
Kevin Jamieson
Arnab Maiti
Lillian J. Ratliff
Publication date: 27 November 2023
Publisher

Abstract

We study the sample complexity of identifying the pure strategy Nash equilibrium (PSNE) in a two-player zero-sum matrix game with noise. Formally, we are given a stochastic model where any learner can sample an entry

(i,j)

of the input matrix

A\in[-1,1]^{n\times m}

and observe

A_{i,j}+\eta

where

\eta

is a zero-mean 1-sub-Gaussian noise. The aim of the learner is to identify the PSNE of

A

, whenever it exists, with high probability while taking as few samples as possible. Zhou et al. (2017) presents an instance-dependent sample complexity lower bound that depends only on the entries in the row and column in which the PSNE lies. We design a near-optimal algorithm whose sample complexity matches the lower bound, up to log factors. The problem of identifying the PSNE also generalizes the problem of pure exploration in stochastic multi-armed bandits and dueling bandits, and our result matches the optimal bounds, up to log factors, in both the settings.Comment: 22 pages, 5 figure

Similar works

Full text

Available Versions

arXiv.org e-Print Archive

oai:arXiv.org:2310.16252

Last time updated on 10/05/2024