25 research outputs found

    Time Average Replicator and Best Reply Dynamics

    Get PDF
    Using an explicit representation in terms of the logit map we show, in a unilateral framework, that the time average of the replicator dynamics is a perturbed solution of the best reply dynamics.replicator dynamics; best reply dynamics; logit map; perturbed differential inclusion; internally chain transitive set; attractor

    Cycles in adversarial regularized learning

    Get PDF
    Regularized learning is a fundamental technique in online optimization, machine learning and many other fields of computer science. A natural question that arises in these settings is how regularized learning algorithms behave when faced against each other. We study a natural formulation of this problem by coupling regularized learning dynamics in zero-sum games. We show that the system's behavior is Poincar\'e recurrent, implying that almost every trajectory revisits any (arbitrarily small) neighborhood of its starting point infinitely often. This cycling behavior is robust to the agents' choice of regularization mechanism (each agent could be using a different regularizer), to positive-affine transformations of the agents' utilities, and it also persists in the case of networked competition, i.e., for zero-sum polymatrix games.Comment: 22 pages, 4 figure

    Adaptive Power Allocation and Control in Time-Varying Multi-Carrier MIMO Networks

    Full text link
    In this paper, we examine the fundamental trade-off between radiated power and achieved throughput in wireless multi-carrier, multiple-input and multiple-output (MIMO) systems that vary with time in an unpredictable fashion (e.g. due to changes in the wireless medium or the users' QoS requirements). Contrary to the static/stationary channel regime, there is no optimal power allocation profile to target (either static or in the mean), so the system's users must adapt to changes in the environment "on the fly", without being able to predict the system's evolution ahead of time. In this dynamic context, we formulate the users' power/throughput trade-off as an online optimization problem and we provide a matrix exponential learning algorithm that leads to no regret - i.e. the proposed transmit policy is asymptotically optimal in hindsight, irrespective of how the system evolves over time. Furthermore, we also examine the robustness of the proposed algorithm under imperfect channel state information (CSI) and we show that it retains its regret minimization properties under very mild conditions on the measurement noise statistics. As a result, users are able to track the evolution of their individually optimum transmit profiles remarkably well, even under rapidly changing network conditions and high uncertainty. Our theoretical analysis is validated by extensive numerical simulations corresponding to a realistic network deployment and providing further insights in the practical implementation aspects of the proposed algorithm.Comment: 25 pages, 4 figure

    Transmit without regrets: Online optimization in MIMO-OFDM cognitive radio systems

    Get PDF
    In this paper, we examine cognitive radio systems that evolve dynamically over time due to changing user and environmental conditions. To combine the advantages of orthogonal frequency division multiplexing (OFDM) and multiple-input, multiple-output (MIMO) technologies, we consider a MIMO-OFDM cognitive radio network where wireless users with multiple antennas communicate over several non-interfering frequency bands. As the network's primary users (PUs) come and go in the system, the communication environment changes constantly (and, in many cases, randomly). Accordingly, the network's unlicensed, secondary users (SUs) must adapt their transmit profiles "on the fly" in order to maximize their data rate in a rapidly evolving environment over which they have no control. In this dynamic setting, static solution concepts (such as Nash equilibrium) are no longer relevant, so we focus on dynamic transmit policies that lead to no regret: specifically, we consider policies that perform at least as well as (and typically outperform) even the best fixed transmit profile in hindsight. Drawing on the method of matrix exponential learning and online mirror descent techniques, we derive a no-regret transmit policy for the system's SUs which relies only on local channel state information (CSI). Using this method, the system's SUs are able to track their individually evolving optimum transmit profiles remarkably well, even under rapidly (and randomly) changing conditions. Importantly, the proposed augmented exponential learning (AXL) policy leads to no regret even if the SUs' channel measurements are subject to arbitrarily large observation errors (the imperfect CSI case), thus ensuring the method's robustness in the presence of uncertainties.Comment: 25 pages, 3 figures, to appear in the IEEE Journal on Selected Areas in Communication

    Time Average Replicator and Best Reply Dynamics

    Get PDF
    International audienceUsing an explicit representation in terms of the logit map we show, in a unilateral framework, that the time average of the replicator dynamics is a perturbed solution of the best reply dynamics

    Inertial game dynamics and applications to constrained optimization

    Get PDF
    Aiming to provide a new class of game dynamics with good long-term rationality properties, we derive a second-order inertial system that builds on the widely studied "heavy ball with friction" optimization method. By exploiting a well-known link between the replicator dynamics and the Shahshahani geometry on the space of mixed strategies, the dynamics are stated in a Riemannian geometric framework where trajectories are accelerated by the players' unilateral payoff gradients and they slow down near Nash equilibria. Surprisingly (and in stark contrast to another second-order variant of the replicator dynamics), the inertial replicator dynamics are not well-posed; on the other hand, it is possible to obtain a well-posed system by endowing the mixed strategy space with a different Hessian-Riemannian (HR) metric structure, and we characterize those HR geometries that do so. In the single-agent version of the dynamics (corresponding to constrained optimization over simplex-like objects), we show that regular maximum points of smooth functions attract all nearby solution orbits with low initial speed. More generally, we establish an inertial variant of the so-called "folk theorem" of evolutionary game theory and we show that strict equilibria are attracting in asymmetric (multi-population) games - provided of course that the dynamics are well-posed. A similar asymptotic stability result is obtained for evolutionarily stable strategies in symmetric (single- population) games.Comment: 30 pages, 4 figures; significantly revised paper structure and added new material on Euclidean embeddings and evolutionarily stable strategie

    Penalty-regulated dynamics and robust learning procedures in games

    Get PDF
    Starting from a heuristic learning scheme for N-person games, we derive a new class of continuous-time learning dynamics consisting of a replicator-like drift adjusted by a penalty term that renders the boundary of the game's strategy space repelling. These penalty-regulated dynamics are equivalent to players keeping an exponentially discounted aggregate of their on-going payoffs and then using a smooth best response to pick an action based on these performance scores. Owing to this inherent duality, the proposed dynamics satisfy a variant of the folk theorem of evolutionary game theory and they converge to (arbitrarily precise) approximations of Nash equilibria in potential games. Motivated by applications to traffic engineering, we exploit this duality further to design a discrete-time, payoff-based learning algorithm which retains these convergence properties and only requires players to observe their in-game payoffs: moreover, the algorithm remains robust in the presence of stochastic perturbations and observation errors, and it does not require any synchronization between players.Comment: 33 pages, 3 figure
    corecore