25 research outputs found
Time Average Replicator and Best Reply Dynamics
Using an explicit representation in terms of the logit map we show, in a unilateral framework, that the time average of the replicator dynamics is a perturbed solution of the best reply dynamics.replicator dynamics; best reply dynamics; logit map; perturbed differential inclusion; internally chain transitive set; attractor
Cycles in adversarial regularized learning
Regularized learning is a fundamental technique in online optimization,
machine learning and many other fields of computer science. A natural question
that arises in these settings is how regularized learning algorithms behave
when faced against each other. We study a natural formulation of this problem
by coupling regularized learning dynamics in zero-sum games. We show that the
system's behavior is Poincar\'e recurrent, implying that almost every
trajectory revisits any (arbitrarily small) neighborhood of its starting point
infinitely often. This cycling behavior is robust to the agents' choice of
regularization mechanism (each agent could be using a different regularizer),
to positive-affine transformations of the agents' utilities, and it also
persists in the case of networked competition, i.e., for zero-sum polymatrix
games.Comment: 22 pages, 4 figure
Adaptive Power Allocation and Control in Time-Varying Multi-Carrier MIMO Networks
In this paper, we examine the fundamental trade-off between radiated power
and achieved throughput in wireless multi-carrier, multiple-input and
multiple-output (MIMO) systems that vary with time in an unpredictable fashion
(e.g. due to changes in the wireless medium or the users' QoS requirements).
Contrary to the static/stationary channel regime, there is no optimal power
allocation profile to target (either static or in the mean), so the system's
users must adapt to changes in the environment "on the fly", without being able
to predict the system's evolution ahead of time. In this dynamic context, we
formulate the users' power/throughput trade-off as an online optimization
problem and we provide a matrix exponential learning algorithm that leads to no
regret - i.e. the proposed transmit policy is asymptotically optimal in
hindsight, irrespective of how the system evolves over time. Furthermore, we
also examine the robustness of the proposed algorithm under imperfect channel
state information (CSI) and we show that it retains its regret minimization
properties under very mild conditions on the measurement noise statistics. As a
result, users are able to track the evolution of their individually optimum
transmit profiles remarkably well, even under rapidly changing network
conditions and high uncertainty. Our theoretical analysis is validated by
extensive numerical simulations corresponding to a realistic network deployment
and providing further insights in the practical implementation aspects of the
proposed algorithm.Comment: 25 pages, 4 figure
Transmit without regrets: Online optimization in MIMO-OFDM cognitive radio systems
In this paper, we examine cognitive radio systems that evolve dynamically
over time due to changing user and environmental conditions. To combine the
advantages of orthogonal frequency division multiplexing (OFDM) and
multiple-input, multiple-output (MIMO) technologies, we consider a MIMO-OFDM
cognitive radio network where wireless users with multiple antennas communicate
over several non-interfering frequency bands. As the network's primary users
(PUs) come and go in the system, the communication environment changes
constantly (and, in many cases, randomly). Accordingly, the network's
unlicensed, secondary users (SUs) must adapt their transmit profiles "on the
fly" in order to maximize their data rate in a rapidly evolving environment
over which they have no control. In this dynamic setting, static solution
concepts (such as Nash equilibrium) are no longer relevant, so we focus on
dynamic transmit policies that lead to no regret: specifically, we consider
policies that perform at least as well as (and typically outperform) even the
best fixed transmit profile in hindsight. Drawing on the method of matrix
exponential learning and online mirror descent techniques, we derive a
no-regret transmit policy for the system's SUs which relies only on local
channel state information (CSI). Using this method, the system's SUs are able
to track their individually evolving optimum transmit profiles remarkably well,
even under rapidly (and randomly) changing conditions. Importantly, the
proposed augmented exponential learning (AXL) policy leads to no regret even if
the SUs' channel measurements are subject to arbitrarily large observation
errors (the imperfect CSI case), thus ensuring the method's robustness in the
presence of uncertainties.Comment: 25 pages, 3 figures, to appear in the IEEE Journal on Selected Areas
in Communication
Time Average Replicator and Best Reply Dynamics
International audienceUsing an explicit representation in terms of the logit map we show, in a unilateral framework, that the time average of the replicator dynamics is a perturbed solution of the best reply dynamics
Inertial game dynamics and applications to constrained optimization
Aiming to provide a new class of game dynamics with good long-term
rationality properties, we derive a second-order inertial system that builds on
the widely studied "heavy ball with friction" optimization method. By
exploiting a well-known link between the replicator dynamics and the
Shahshahani geometry on the space of mixed strategies, the dynamics are stated
in a Riemannian geometric framework where trajectories are accelerated by the
players' unilateral payoff gradients and they slow down near Nash equilibria.
Surprisingly (and in stark contrast to another second-order variant of the
replicator dynamics), the inertial replicator dynamics are not well-posed; on
the other hand, it is possible to obtain a well-posed system by endowing the
mixed strategy space with a different Hessian-Riemannian (HR) metric structure,
and we characterize those HR geometries that do so. In the single-agent version
of the dynamics (corresponding to constrained optimization over simplex-like
objects), we show that regular maximum points of smooth functions attract all
nearby solution orbits with low initial speed. More generally, we establish an
inertial variant of the so-called "folk theorem" of evolutionary game theory
and we show that strict equilibria are attracting in asymmetric
(multi-population) games - provided of course that the dynamics are well-posed.
A similar asymptotic stability result is obtained for evolutionarily stable
strategies in symmetric (single- population) games.Comment: 30 pages, 4 figures; significantly revised paper structure and added
new material on Euclidean embeddings and evolutionarily stable strategie
Penalty-regulated dynamics and robust learning procedures in games
Starting from a heuristic learning scheme for N-person games, we derive a new
class of continuous-time learning dynamics consisting of a replicator-like
drift adjusted by a penalty term that renders the boundary of the game's
strategy space repelling. These penalty-regulated dynamics are equivalent to
players keeping an exponentially discounted aggregate of their on-going payoffs
and then using a smooth best response to pick an action based on these
performance scores. Owing to this inherent duality, the proposed dynamics
satisfy a variant of the folk theorem of evolutionary game theory and they
converge to (arbitrarily precise) approximations of Nash equilibria in
potential games. Motivated by applications to traffic engineering, we exploit
this duality further to design a discrete-time, payoff-based learning algorithm
which retains these convergence properties and only requires players to observe
their in-game payoffs: moreover, the algorithm remains robust in the presence
of stochastic perturbations and observation errors, and it does not require any
synchronization between players.Comment: 33 pages, 3 figure