2,236 research outputs found
Approachability in unknown games: Online learning meets multi-objective optimization
In the standard setting of approachability there are two players and a target
set. The players play repeatedly a known vector-valued game where the first
player wants to have the average vector-valued payoff converge to the target
set which the other player tries to exclude it from this set. We revisit this
setting in the spirit of online learning and do not assume that the first
player knows the game structure: she receives an arbitrary vector-valued reward
vector at every round. She wishes to approach the smallest ("best") possible
set given the observed average payoffs in hindsight. This extension of the
standard setting has implications even when the original target set is not
approachable and when it is not obvious which expansion of it should be
approached instead. We show that it is impossible, in general, to approach the
best target set in hindsight and propose achievable though ambitious
alternative goals. We further propose a concrete strategy to approach these
goals. Our method does not require projection onto a target set and amounts to
switching between scalar regret minimization algorithms that are performed in
episodes. Applications to global cost minimization and to approachability under
sample path constraints are considered
Online Learning: Beyond Regret
We study online learnability of a wide class of problems, extending the
results of (Rakhlin, Sridharan, Tewari, 2010) to general notions of performance
measure well beyond external regret. Our framework simultaneously captures such
well-known notions as internal and general Phi-regret, learning with
non-additive global cost functions, Blackwell's approachability, calibration of
forecasters, adaptive regret, and more. We show that learnability in all these
situations is due to control of the same three quantities: a martingale
convergence term, a term describing the ability to perform well if future is
known, and a generalization of sequential Rademacher complexity, studied in
(Rakhlin, Sridharan, Tewari, 2010). Since we directly study complexity of the
problem instead of focusing on efficient algorithms, we are able to improve and
extend many known results which have been previously derived via an algorithmic
construction
Applying Metric Regularity to Compute a Condition Measure of a Smoothing Algorithm for Matrix Games
We develop an approach of variational analysis and generalized
differentiation to conditioning issues for two-person zero-sum matrix games.
Our major results establish precise relationships between a certain condition
measure of the smoothing first-order algorithm proposed by Gilpin et al.
[Proceedings of the 23rd AAAI Conference (2008) pp. 75-82] and the exact bound
of metric regularity for an associated set-valued mapping. In this way we
compute the aforementioned condition measure in terms of the initial matrix
game data
A Stochastic View of Optimal Regret through Minimax Duality
We study the regret of optimal strategies for online convex optimization
games. Using von Neumann's minimax theorem, we show that the optimal regret in
this adversarial setting is closely related to the behavior of the empirical
minimization algorithm in a stochastic process setting: it is equal to the
maximum, over joint distributions of the adversary's action sequence, of the
difference between a sum of minimal expected losses and the minimal empirical
loss. We show that the optimal regret has a natural geometric interpretation,
since it can be viewed as the gap in Jensen's inequality for a concave
functional--the minimizer over the player's actions of expected loss--defined
on a set of probability distributions. We use this expression to obtain upper
and lower bounds on the regret of an optimal strategy for a variety of online
learning problems. Our method provides upper bounds without the need to
construct a learning algorithm; the lower bounds provide explicit optimal
strategies for the adversary
Policy iteration for perfect information stochastic mean payoff games with bounded first return times is strongly polynomial
Recent results of Ye and Hansen, Miltersen and Zwick show that policy
iteration for one or two player (perfect information) zero-sum stochastic
games, restricted to instances with a fixed discount rate, is strongly
polynomial. We show that policy iteration for mean-payoff zero-sum stochastic
games is also strongly polynomial when restricted to instances with bounded
first mean return time to a given state. The proof is based on methods of
nonlinear Perron-Frobenius theory, allowing us to reduce the mean-payoff
problem to a discounted problem with state dependent discount rate. Our
analysis also shows that policy iteration remains strongly polynomial for
discounted problems in which the discount rate can be state dependent (and even
negative) at certain states, provided that the spectral radii of the
nonnegative matrices associated to all strategies are bounded from above by a
fixed constant strictly less than 1.Comment: 17 page
A distance for probability spaces, and long-term values in Markov Decision Processes and Repeated Games
Given a finite set , we denote by the set of probabilities
on and by the set of Borel probabilities on with finite
support. Studying a Markov Decision Process with partial information on
naturally leads to a Markov Decision Process with full information on . We
introduce a new metric on such that the transitions become
1-Lipschitz from to . In the first part of the article,
we define and prove several properties of the metric . Especially,
satisfies a Kantorovich-Rubinstein type duality formula and can be
characterized by using disintegrations. In the second part, we characterize the
limit values in several classes of "compact non expansive" Markov Decision
Processes. In particular we use the metric to characterize the limit
value in Partial Observation MDP with finitely many states and in Repeated
Games with an informed controller with finite sets of states and actions.
Moreover in each case we can prove the existence of a generalized notion of
uniform value where we consider not only the Ces\`aro mean when the number of
stages is large enough but any evaluation function
when the impatience is small
enough
- …