Location of Repository

In this lecture we use Blackwell’s approachability theorem to formulate both external and internal regret minimizing algorithms. Our study is based primarily on the algorithms presented by Hart and Mas-Colell [6, 7]; see also [3] for a summary. Throughout the lecture we consider a finite two-player game, where each player i has a finite pure action set Ai; let A = ∏ i Ai, and let A−i = ∏ j=i Aj. We let ai denote a pure action for player i, and let si ∈ ∆(Ai) denote a mixed action for player i. We will typically view si as a vector in RAi, with si(ai) equal to the probability that player i places on ai. We let Πi(a) denote the payoff to player i when the composite pure action vector is a, and by an abuse of notation also let Πi(s) denote the expected payoff to player i when the composite mixed action vector is s. The game is played repeatedly by the players. We let hT = (a0,...,a T −1) denote the history up to time T. The external regret of player i against action si after history hT is: ERi(h T ∑T −1;si) = Πi(si,a t −i) − Πi(a t i,a t −i). The internal regret of player i of action ai against action a ′ i after history h T is: t=0 IRi(h T;ai,a ′ ∑T −1 i) = I{a t i = ai} ( Πi(a ′ i,a t −i) − Πi(ai,a t −i) ). t=0 We let p T i ∈ ∆(Ai) denote the marginal empirical distribution of player i’s play up to time T: p T i (ai) =

Year: 2007

OAI identifier:
oai:CiteSeerX.psu:10.1.1.352.6068

Provided by:
CiteSeerX

Download PDF:To submit an update or takedown request for this paper, please submit an Update/Correction/Removal Request.