2,236 research outputs found

    Approachability in unknown games: Online learning meets multi-objective optimization

    Full text link
    In the standard setting of approachability there are two players and a target set. The players play repeatedly a known vector-valued game where the first player wants to have the average vector-valued payoff converge to the target set which the other player tries to exclude it from this set. We revisit this setting in the spirit of online learning and do not assume that the first player knows the game structure: she receives an arbitrary vector-valued reward vector at every round. She wishes to approach the smallest ("best") possible set given the observed average payoffs in hindsight. This extension of the standard setting has implications even when the original target set is not approachable and when it is not obvious which expansion of it should be approached instead. We show that it is impossible, in general, to approach the best target set in hindsight and propose achievable though ambitious alternative goals. We further propose a concrete strategy to approach these goals. Our method does not require projection onto a target set and amounts to switching between scalar regret minimization algorithms that are performed in episodes. Applications to global cost minimization and to approachability under sample path constraints are considered

    Online Learning: Beyond Regret

    Get PDF
    We study online learnability of a wide class of problems, extending the results of (Rakhlin, Sridharan, Tewari, 2010) to general notions of performance measure well beyond external regret. Our framework simultaneously captures such well-known notions as internal and general Phi-regret, learning with non-additive global cost functions, Blackwell's approachability, calibration of forecasters, adaptive regret, and more. We show that learnability in all these situations is due to control of the same three quantities: a martingale convergence term, a term describing the ability to perform well if future is known, and a generalization of sequential Rademacher complexity, studied in (Rakhlin, Sridharan, Tewari, 2010). Since we directly study complexity of the problem instead of focusing on efficient algorithms, we are able to improve and extend many known results which have been previously derived via an algorithmic construction

    Applying Metric Regularity to Compute a Condition Measure of a Smoothing Algorithm for Matrix Games

    Full text link
    We develop an approach of variational analysis and generalized differentiation to conditioning issues for two-person zero-sum matrix games. Our major results establish precise relationships between a certain condition measure of the smoothing first-order algorithm proposed by Gilpin et al. [Proceedings of the 23rd AAAI Conference (2008) pp. 75-82] and the exact bound of metric regularity for an associated set-valued mapping. In this way we compute the aforementioned condition measure in terms of the initial matrix game data

    A Stochastic View of Optimal Regret through Minimax Duality

    Get PDF
    We study the regret of optimal strategies for online convex optimization games. Using von Neumann's minimax theorem, we show that the optimal regret in this adversarial setting is closely related to the behavior of the empirical minimization algorithm in a stochastic process setting: it is equal to the maximum, over joint distributions of the adversary's action sequence, of the difference between a sum of minimal expected losses and the minimal empirical loss. We show that the optimal regret has a natural geometric interpretation, since it can be viewed as the gap in Jensen's inequality for a concave functional--the minimizer over the player's actions of expected loss--defined on a set of probability distributions. We use this expression to obtain upper and lower bounds on the regret of an optimal strategy for a variety of online learning problems. Our method provides upper bounds without the need to construct a learning algorithm; the lower bounds provide explicit optimal strategies for the adversary

    Policy iteration for perfect information stochastic mean payoff games with bounded first return times is strongly polynomial

    Full text link
    Recent results of Ye and Hansen, Miltersen and Zwick show that policy iteration for one or two player (perfect information) zero-sum stochastic games, restricted to instances with a fixed discount rate, is strongly polynomial. We show that policy iteration for mean-payoff zero-sum stochastic games is also strongly polynomial when restricted to instances with bounded first mean return time to a given state. The proof is based on methods of nonlinear Perron-Frobenius theory, allowing us to reduce the mean-payoff problem to a discounted problem with state dependent discount rate. Our analysis also shows that policy iteration remains strongly polynomial for discounted problems in which the discount rate can be state dependent (and even negative) at certain states, provided that the spectral radii of the nonnegative matrices associated to all strategies are bounded from above by a fixed constant strictly less than 1.Comment: 17 page

    A distance for probability spaces, and long-term values in Markov Decision Processes and Repeated Games

    Full text link
    Given a finite set KK, we denote by X=Δ(K)X=\Delta(K) the set of probabilities on KK and by Z=Δf(X)Z=\Delta_f(X) the set of Borel probabilities on XX with finite support. Studying a Markov Decision Process with partial information on KK naturally leads to a Markov Decision Process with full information on XX. We introduce a new metric d∗d_* on ZZ such that the transitions become 1-Lipschitz from (X,∥.∥1)(X, \|.\|_1) to (Z,d∗)(Z,d_*). In the first part of the article, we define and prove several properties of the metric d∗d_*. Especially, d∗d_* satisfies a Kantorovich-Rubinstein type duality formula and can be characterized by using disintegrations. In the second part, we characterize the limit values in several classes of "compact non expansive" Markov Decision Processes. In particular we use the metric d∗d_* to characterize the limit value in Partial Observation MDP with finitely many states and in Repeated Games with an informed controller with finite sets of states and actions. Moreover in each case we can prove the existence of a generalized notion of uniform value where we consider not only the Ces\`aro mean when the number of stages is large enough but any evaluation function θ∈Δ(N∗)\theta \in \Delta(\N^*) when the impatience I(θ)=∑t≥1∣θt+1−θt∣I(\theta)=\sum_{t\geq 1} |\theta_{t+1}-\theta_t| is small enough
    • …
    corecore