29,451 research outputs found
Joint strategy fictitious play with inertia for potential games
We consider multi-player repeated games involving a large number of players with large strategy spaces and enmeshed utility structures. In these ldquolarge-scalerdquo games, players are inherently faced with limitations in both their observational and computational capabilities. Accordingly, players in large-scale games need to make their decisions using algorithms that accommodate limitations in information gathering and processing. This disqualifies some of the well known decision making models such as ldquoFictitious Playrdquo (FP), in which each player must monitor the individual actions of every other player and must optimize over a high dimensional probability space. We will show that Joint Strategy Fictitious Play (JSFP), a close variant of FP, alleviates both the informational and computational burden of FP. Furthermore, we introduce JSFP with inertia, i.e., a probabilistic reluctance to change strategies, and establish the convergence to a pure Nash equilibrium in all generalized ordinal potential games in both cases of averaged or exponentially discounted historical data. We illustrate JSFP with inertia on the specific class of congestion games, a subset of generalized ordinal potential games. In particular, we illustrate the main results on a distributed traffic routing problem and derive tolling procedures that can lead to optimized total traffic congestion
Learning Equilibria with Partial Information in Decentralized Wireless Networks
In this article, a survey of several important equilibrium concepts for
decentralized networks is presented. The term decentralized is used here to
refer to scenarios where decisions (e.g., choosing a power allocation policy)
are taken autonomously by devices interacting with each other (e.g., through
mutual interference). The iterative long-term interaction is characterized by
stable points of the wireless network called equilibria. The interest in these
equilibria stems from the relevance of network stability and the fact that they
can be achieved by letting radio devices to repeatedly interact over time. To
achieve these equilibria, several learning techniques, namely, the best
response dynamics, fictitious play, smoothed fictitious play, reinforcement
learning algorithms, and regret matching, are discussed in terms of information
requirements and convergence properties. Most of the notions introduced here,
for both equilibria and learning schemes, are illustrated by a simple case
study, namely, an interference channel with two transmitter-receiver pairs.Comment: 16 pages, 5 figures, 1 table. To appear in IEEE Communication
Magazine, special Issue on Game Theor
Transmit without regrets: Online optimization in MIMO-OFDM cognitive radio systems
In this paper, we examine cognitive radio systems that evolve dynamically
over time due to changing user and environmental conditions. To combine the
advantages of orthogonal frequency division multiplexing (OFDM) and
multiple-input, multiple-output (MIMO) technologies, we consider a MIMO-OFDM
cognitive radio network where wireless users with multiple antennas communicate
over several non-interfering frequency bands. As the network's primary users
(PUs) come and go in the system, the communication environment changes
constantly (and, in many cases, randomly). Accordingly, the network's
unlicensed, secondary users (SUs) must adapt their transmit profiles "on the
fly" in order to maximize their data rate in a rapidly evolving environment
over which they have no control. In this dynamic setting, static solution
concepts (such as Nash equilibrium) are no longer relevant, so we focus on
dynamic transmit policies that lead to no regret: specifically, we consider
policies that perform at least as well as (and typically outperform) even the
best fixed transmit profile in hindsight. Drawing on the method of matrix
exponential learning and online mirror descent techniques, we derive a
no-regret transmit policy for the system's SUs which relies only on local
channel state information (CSI). Using this method, the system's SUs are able
to track their individually evolving optimum transmit profiles remarkably well,
even under rapidly (and randomly) changing conditions. Importantly, the
proposed augmented exponential learning (AXL) policy leads to no regret even if
the SUs' channel measurements are subject to arbitrarily large observation
errors (the imperfect CSI case), thus ensuring the method's robustness in the
presence of uncertainties.Comment: 25 pages, 3 figures, to appear in the IEEE Journal on Selected Areas
in Communication
Recommended from our members
xDelia final report: emotion-centred financial decision making and learning
xDelia is a 3-year pan-European project building on the knowledge, skills, and competences of seven partner organisations from a variety of research disciplines and from business. The principal objective of xDelia is to develop technology-enhanced learning approaches that help improve the financial decision making of investors who trade frequently using an electronic trading platform. We focus on emotions, and how they affect maladaptive decision biases and trading performance. Our earlier field work with traders has shown that the development of emotion regulation skills is a key facet of trader expertise. For that reason we consider expert traders our benchmark for adaptive behaviour rather than normative rationality. Our goal is to provide investors with the tools and techniques to develop greater self-awareness of internal states, increase their ability to reflect critically on emotion-informed choices, develop emotion management skills, and support the transfer of these skills to the real-world practice setting of financial trading.
This report provides a comprehensive overview of what xDelia is about and what we have achieved over the life of the project. In the sections that follow, we explain the decision problems investors are faced with in a fast paced environment and the limitations of traditional approaches to reduce cognitive errors; introduce an alternative, technology-enhanced learning approach of diagnosis and feedback, skill development, and transfer; describe the learning intervention comprising twelve autonomous learning elements that we have developed; and present evidence from thirty-five studies we have conducted on learning effects and stakeholder acceptance
Approachability in unknown games: Online learning meets multi-objective optimization
In the standard setting of approachability there are two players and a target
set. The players play repeatedly a known vector-valued game where the first
player wants to have the average vector-valued payoff converge to the target
set which the other player tries to exclude it from this set. We revisit this
setting in the spirit of online learning and do not assume that the first
player knows the game structure: she receives an arbitrary vector-valued reward
vector at every round. She wishes to approach the smallest ("best") possible
set given the observed average payoffs in hindsight. This extension of the
standard setting has implications even when the original target set is not
approachable and when it is not obvious which expansion of it should be
approached instead. We show that it is impossible, in general, to approach the
best target set in hindsight and propose achievable though ambitious
alternative goals. We further propose a concrete strategy to approach these
goals. Our method does not require projection onto a target set and amounts to
switching between scalar regret minimization algorithms that are performed in
episodes. Applications to global cost minimization and to approachability under
sample path constraints are considered
Channel Selection for Network-assisted D2D Communication via No-Regret Bandit Learning with Calibrated Forecasting
We consider the distributed channel selection problem in the context of
device-to-device (D2D) communication as an underlay to a cellular network.
Underlaid D2D users communicate directly by utilizing the cellular spectrum but
their decisions are not governed by any centralized controller. Selfish D2D
users that compete for access to the resources construct a distributed system,
where the transmission performance depends on channel availability and quality.
This information, however, is difficult to acquire. Moreover, the adverse
effects of D2D users on cellular transmissions should be minimized. In order to
overcome these limitations, we propose a network-assisted distributed channel
selection approach in which D2D users are only allowed to use vacant cellular
channels. This scenario is modeled as a multi-player multi-armed bandit game
with side information, for which a distributed algorithmic solution is
proposed. The solution is a combination of no-regret learning and calibrated
forecasting, and can be applied to a broad class of multi-player stochastic
learning problems, in addition to the formulated channel selection problem.
Analytically, it is established that this approach not only yields vanishing
regret (in comparison to the global optimal solution), but also guarantees that
the empirical joint frequencies of the game converge to the set of correlated
equilibria.Comment: 31 pages (one column), 9 figure
- …