We study the sample complexity of identifying the pure strategy Nash
equilibrium (PSNE) in a two-player zero-sum matrix game with noise. Formally,
we are given a stochastic model where any learner can sample an entry (i,j)
of the input matrix A∈[−1,1]n×m and observe Ai,j​+η where
η is a zero-mean 1-sub-Gaussian noise. The aim of the learner is to
identify the PSNE of A, whenever it exists, with high probability while
taking as few samples as possible. Zhou et al. (2017) presents an
instance-dependent sample complexity lower bound that depends only on the
entries in the row and column in which the PSNE lies. We design a near-optimal
algorithm whose sample complexity matches the lower bound, up to log factors.
The problem of identifying the PSNE also generalizes the problem of pure
exploration in stochastic multi-armed bandits and dueling bandits, and our
result matches the optimal bounds, up to log factors, in both the settings.Comment: 22 pages, 5 figure