Search CORE

25 research outputs found

Time Average Replicator and Best Reply Dynamics

Author: Josef Hofbauer
Sylvain Sorin
Yannick Viossat
Publication venue
Publication date
Field of study

Using an explicit representation in terms of the logit map we show, in a unilateral framework, that the time average of the replicator dynamics is a perturbed solution of the best reply dynamics.replicator dynamics; best reply dynamics; logit map; perturbed differential inclusion; internally chain transitive set; attractor

Research Papers in Economics

Cycles in adversarial regularized learning

Author: Mertikopoulos Panayotis
Papadimitriou Christos
Piliouras Georgios
Publication venue
Publication date: 08/09/2017
Field of study

Regularized learning is a fundamental technique in online optimization, machine learning and many other fields of computer science. A natural question that arises in these settings is how regularized learning algorithms behave when faced against each other. We study a natural formulation of this problem by coupling regularized learning dynamics in zero-sum games. We show that the system's behavior is Poincar\'e recurrent, implying that almost every trajectory revisits any (arbitrarily small) neighborhood of its starting point infinitely often. This cycling behavior is robust to the agents' choice of regularization mechanism (each agent could be using a different regularizer), to positive-affine transformations of the agents' utilities, and it also persists in the case of networked competition, i.e., for zero-sum polymatrix games.Comment: 22 pages, 4 figure

arXiv.org e-Print Archive

Hal - Université Grenoble Alpes

INRIA a CCSD electronic archive server

Adaptive Power Allocation and Control in Time-Varying Multi-Carrier MIMO Networks

Author: Mertikopoulos Panayotis
Stiakogiannakis Ioannis
Touati Corinne
Publication venue
Publication date: 07/03/2015
Field of study

In this paper, we examine the fundamental trade-off between radiated power and achieved throughput in wireless multi-carrier, multiple-input and multiple-output (MIMO) systems that vary with time in an unpredictable fashion (e.g. due to changes in the wireless medium or the users' QoS requirements). Contrary to the static/stationary channel regime, there is no optimal power allocation profile to target (either static or in the mean), so the system's users must adapt to changes in the environment "on the fly", without being able to predict the system's evolution ahead of time. In this dynamic context, we formulate the users' power/throughput trade-off as an online optimization problem and we provide a matrix exponential learning algorithm that leads to no regret - i.e. the proposed transmit policy is asymptotically optimal in hindsight, irrespective of how the system evolves over time. Furthermore, we also examine the robustness of the proposed algorithm under imperfect channel state information (CSI) and we show that it retains its regret minimization properties under very mild conditions on the measurement noise statistics. As a result, users are able to track the evolution of their individually optimum transmit profiles remarkably well, even under rapidly changing network conditions and high uncertainty. Our theoretical analysis is validated by extensive numerical simulations corresponding to a realistic network deployment and providing further insights in the practical implementation aspects of the proposed algorithm.Comment: 25 pages, 4 figure

arXiv.org e-Print Archive

Hal - Université Grenoble Alpes

INRIA a CCSD electronic archive server

Hal-Diderot

Transmit without regrets: Online optimization in MIMO-OFDM cognitive radio systems

Author: Belmega E. Veronica
Mertikopoulos Panayotis
Publication venue
Publication date: 01/01/2014
Field of study

In this paper, we examine cognitive radio systems that evolve dynamically over time due to changing user and environmental conditions. To combine the advantages of orthogonal frequency division multiplexing (OFDM) and multiple-input, multiple-output (MIMO) technologies, we consider a MIMO-OFDM cognitive radio network where wireless users with multiple antennas communicate over several non-interfering frequency bands. As the network's primary users (PUs) come and go in the system, the communication environment changes constantly (and, in many cases, randomly). Accordingly, the network's unlicensed, secondary users (SUs) must adapt their transmit profiles "on the fly" in order to maximize their data rate in a rapidly evolving environment over which they have no control. In this dynamic setting, static solution concepts (such as Nash equilibrium) are no longer relevant, so we focus on dynamic transmit policies that lead to no regret: specifically, we consider policies that perform at least as well as (and typically outperform) even the best fixed transmit profile in hindsight. Drawing on the method of matrix exponential learning and online mirror descent techniques, we derive a no-regret transmit policy for the system's SUs which relies only on local channel state information (CSI). Using this method, the system's SUs are able to track their individually evolving optimum transmit profiles remarkably well, even under rapidly (and randomly) changing conditions. Importantly, the proposed augmented exponential learning (AXL) policy leads to no regret even if the SUs' channel measurements are subject to arbitrarily large observation errors (the imperfect CSI case), thus ensuring the method's robustness in the presence of uncertainties.Comment: 25 pages, 3 figures, to appear in the IEEE Journal on Selected Areas in Communication

arXiv.org e-Print Archive

Crossref

Hal - Université Grenoble Alpes

INRIA a CCSD electronic archive server

Time Average Replicator and Best Reply Dynamics

Author: Hofbauer Josef
Sorin Sylvain
Viossat Yannick
Publication venue: 'Institute for Operations Research and the Management Sciences (INFORMS)'
Publication date: 01/05/2009
Field of study

International audienceUsing an explicit representation in terms of the logit map we show, in a unilateral framework, that the time average of the replicator dynamics is a perturbed solution of the best reply dynamics

Base de publications de l'université Paris-Dauphine

HAL-Polytechnique

Inertial game dynamics and applications to constrained optimization

Author: Laraki Rida
Mertikopoulos Panayotis
Publication venue
Publication date: 01/01/2015
Field of study

Aiming to provide a new class of game dynamics with good long-term rationality properties, we derive a second-order inertial system that builds on the widely studied "heavy ball with friction" optimization method. By exploiting a well-known link between the replicator dynamics and the Shahshahani geometry on the space of mixed strategies, the dynamics are stated in a Riemannian geometric framework where trajectories are accelerated by the players' unilateral payoff gradients and they slow down near Nash equilibria. Surprisingly (and in stark contrast to another second-order variant of the replicator dynamics), the inertial replicator dynamics are not well-posed; on the other hand, it is possible to obtain a well-posed system by endowing the mixed strategy space with a different Hessian-Riemannian (HR) metric structure, and we characterize those HR geometries that do so. In the single-agent version of the dynamics (corresponding to constrained optimization over simplex-like objects), we show that regular maximum points of smooth functions attract all nearby solution orbits with low initial speed. More generally, we establish an inertial variant of the so-called "folk theorem" of evolutionary game theory and we show that strict equilibria are attracting in asymmetric (multi-population) games - provided of course that the dynamics are well-posed. A similar asymptotic stability result is obtained for evolutionarily stable strategies in symmetric (single- population) games.Comment: 30 pages, 4 figures; significantly revised paper structure and added new material on Euclidean embeddings and evolutionarily stable strategie

arXiv.org e-Print Archive

CiteSeerX

University of Liverpool Repository

Crossref

Hal - Université Grenoble Alpes

Penalty-regulated dynamics and robust learning procedures in games

Author: Coucheney Pierre
Gaujal Bruno
Mertikopoulos Panayotis
Publication venue
Publication date: 06/04/2014
Field of study

Starting from a heuristic learning scheme for N-person games, we derive a new class of continuous-time learning dynamics consisting of a replicator-like drift adjusted by a penalty term that renders the boundary of the game's strategy space repelling. These penalty-regulated dynamics are equivalent to players keeping an exponentially discounted aggregate of their on-going payoffs and then using a smooth best response to pick an action based on these performance scores. Owing to this inherent duality, the proposed dynamics satisfy a variant of the folk theorem of evolutionary game theory and they converge to (arbitrarily precise) approximations of Nash equilibria in potential games. Motivated by applications to traffic engineering, we exploit this duality further to design a discrete-time, payoff-based learning algorithm which retains these convergence properties and only requires players to observe their in-game payoffs: moreover, the algorithm remains robust in the presence of stochastic perturbations and observation errors, and it does not require any synchronization between players.Comment: 33 pages, 3 figure

arXiv.org e-Print Archive

Crossref

Hal - Université Grenoble Alpes

INRIA a CCSD electronic archive server

HAL UVSQ