24 research outputs found

    Minimizing Regret on Reflexive Banach Spaces and Nash Equilibria in Continuous Zero-Sum Games

    Get PDF
    Abstract We study a general adversarial online learning problem, in which we are given a decision set X in a reflexive Banach space X and a sequence of reward vectors in the dual space of X. At each iteration, we choose an action from X , based on the observed sequence of previous rewards. Our goal is to minimize regret. Using results from infinite dimensional convex analysis, we generalize the method of Dual Averaging to our setting and obtain upper bounds on the worst-case regret that generalize many previous results. Under the assumption of uniformly continuous rewards, we obtain explicit regret bounds in a setting where the decision set is the set of probability distributions on a compact metric space S. Importantly, we make no convexity assumptions on either S or the reward functions. We also prove a general lower bound on the worst-case regret for any online algorithm. We then apply these results to the problem of learning in repeated two-player zero-sum games on compact metric spaces. In doing so, we first prove that if both players play a Hannan-consistent strategy, then with probability 1 the empirical distributions of play weakly converge to the set of Nash equilibria of the game. We then show that, under mild assumptions, Dual Averaging on the (infinite-dimensional) space of probability distributions indeed achieves Hannan-consistency

    Learning in Games with Lossy Feedback

    Get PDF
    International audienceWe consider a game-theoretical multi-agent learning problem where the feedback information can be lost during the learning process and rewards are given by a broad class of games known as variationally stable games. We propose a simple variant of the classical online gradient descent algorithm, called reweighted online gradient descent (ROGD) and show that in variationally stable games, if each agent adopts ROGD, then almost sure convergence to the set of Nash equilibria is guaranteed, even when the feedback loss is asynchronous and arbitrarily corrrelated among agents. We then extend the framework to deal with unknown feedback loss probabilities by using an estimator (constructed from past data) in its replacement. Finally, we further extend the framework to accomodate both asynchronous loss and stochastic rewards and establish that multi-agent ROGD learning still converges to the set of Nash equilibria in such settings. Together, these results contribute to the broad lanscape of multi-agent online learning by significantly relaxing the feedback information that is required to achieve desirable outcomes

    Gradient-free Online Learning in Games with Delayed Rewards

    Get PDF
    Motivated by applications to online advertising and recommender systems, we consider a game-theoretic model with delayed rewards and asynchronous, payoff-based feedback. In contrast to previous work on delayed multi-armed bandits, we focus on multi-player games with continuous action spaces, and we examine the long-run behavior of strategic agents that follow a no-regret learning policy (but are otherwise oblivious to the game being played, the objectives of their opponents, etc.). To account for the lack of a consistent stream of information (for instance, rewards can arrive out of order, with an a priori unbounded delay, etc.), we introduce a gradient-free learning policy where payoff information is placed in a priority queue as it arrives. In this general context, we derive new bounds for the agents' regret; furthermore, under a standard diagonal concavity assumption, we show that the induced sequence of play converges to Nash equilibrium with probability 11, even if the delay between choosing an action and receiving the corresponding reward is unbounded.Comment: 26 pages, 4 figures; to appear in ICML 202

    Gradient-free Online Learning in Games with Delayed Rewards

    Get PDF
    International audienceMotivated by applications to online advertising and recommender systems, we consider a gametheoretic model with delayed rewards and asynchronous, payoff-based feedback. In contrast to previous work on delayed multi-armed bandits, we focus on multi-player games with continuous action spaces, and we examine the long-run behavior of strategic agents that follow a no-regret learning policy (but are otherwise oblivious to the game being played, the objectives of their opponents, etc.). To account for the lack of a consistent stream of information (for instance, rewards can arrive out of order, with an a priori unbounded delay, etc.), we introduce a gradient-free learning policy where payoff information is placed in a priority queue as it arrives. In this general context, we derive new bounds for the agents' regret; furthermore, under a standard diagonal concavity assumption, we show that the induced sequence of play converges to Nash equilibrium (NE) with probability 1, even if the delay between choosing an action and receiving the corresponding reward is unbounded

    International Conference on Continuous Optimization (ICCOPT) 2019 Conference Book

    Get PDF
    The Sixth International Conference on Continuous Optimization took place on the campus of the Technical University of Berlin, August 3-8, 2019. The ICCOPT is a flagship conference of the Mathematical Optimization Society (MOS), organized every three years. ICCOPT 2019 was hosted by the Weierstrass Institute for Applied Analysis and Stochastics (WIAS) Berlin. It included a Summer School and a Conference with a series of plenary and semi-plenary talks, organized and contributed sessions, and poster sessions. This book comprises the full conference program. It contains, in particular, the scientific program in survey style as well as with all details, and information on the social program, the venue, special meetings, and more

    ReLOAD: reinforcement learning with optimistic ascent-descent for last-iterate convergence in constrained MDPs

    Get PDF
    In recent years, reinforcement learning (RL) has been applied to real-world problems with increasing success. Such applications often require to put constraints on the agent’s behavior. Existing algorithms for constrained RL (CRL) rely on gradient descent-ascent, but this approach comes with a caveat. While these algorithms are guaranteed to converge on average, they do not guarantee last-iterate convergence, i.e., the current policy of the agent may never converge to the optimal solution. In practice, it is often observed that the policy alternates between satisfying the constraints and maximizing the reward, rarely accomplishing both objectives simultaneously. Here, we address this problem by introducing Reinforcement Learning with Optimistic Ascent-Descent (ReLOAD), a principled CRL method with guaranteed last-iterate convergence. We demonstrate its empirical effectiveness on a wide variety of CRL problems including discrete MDPs and continuous control. In the process we establish a benchmark of challenging CRL problems

    Online and Stochastic Optimization beyond Lipschitz Continuity: A Riemannian Approach

    Get PDF
    International audienceMotivated by applications to machine learning and imaging science, we study a class of online and stochastic optimization problems with loss functions that are not Lipschitz continuous; in particular, the loss functions encountered by the optimizer could exhibit gradient singularities or be singular themselves. Drawing on tools and techniques from Riemannian geometry, we examine a Riemann-Lipschitz (RL) continuity condition which is tailored to the singularity landscape of the problem's loss functions. In this way, we are able to tackle cases beyond the Lipschitz framework provided by a global norm, and we derive optimal regret bounds and last iterate convergence results through the use of regularized learning methods (such as online mirror descent). These results are subsequently validated in a class of stochastic Poisson inverse problems that arise in imaging science

    Aspiration Based Decision Analysis and Support Part I: Theoretical and Methodological Backgrounds

    Get PDF
    In the interdisciplinary and intercultural systems analysis that constitutes the main theme of research in IIASA, a basic question is how to analyze and support decisions with help of mathematical models and logical procedures. This question -- particularly in its multi-criteria and multi-cultural dimensions -- has been investigated in System and Decision Sciences Program (SDS) since the beginning of IIASA. Researchers working both at IIASA and in a large international network of cooperating institutions contributed to a deeper understanding of this question. Around 1980, the concept of reference point multiobjective optimization was developed in SDS. This concept determined an international trend of research pursued in many countries cooperating with IIASA as well as in many research programs at IIASA -- such as energy, agricultural, environmental research. SDS organized since this time numerous international workshops, summer schools, seminar days and cooperative research agreements in the field of decision analysis and support. By this international and interdisciplinary cooperation, the concept of reference point multiobjective optimization has matured and was generalized into a framework of aspiration based decision analysis and support that can be understood as a synthesis of several known, antithetical approaches to this subject -- such as utility maximization approach, or satisficing approach, or goal -- program -- oriented planning approach. Jointly, the name of quasisatisficing approach can be also used, since the concept of aspirations comes from the satisficing approach. Both authors of the Working Paper contributed actively to this research: Andrzej Wierzbicki originated the concept of reference point multiobjective optimization and quasisatisficing approach, while Andrzej Lewandowski, working from the beginning in the numerous applications and extensions of this concept, has had the main contribution to its generalization into the framework of aspiration based decision analysis and support systems. This paper constitutes a draft of the first part of a book being prepared by these two authors. Part I, devoted to theoretical foundations and methodological background, written mostly by Andrzej Wierzbicki, will be followed by Part II, devoted to computer implementations and applications of decision support systems based on mathematical programming models, written mostly by Andrzej Lewandowski. Part III, devoted to decision support systems for the case of subjective evaluations of discrete decision alternatives, will be written by both authors
    corecore