Search CORE

29,451 research outputs found

Joint strategy fictitious play with inertia for potential games

Author: Arslan G.
Marden J. R.
Shamma J. S.
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/02/2009
Field of study

We consider multi-player repeated games involving a large number of players with large strategy spaces and enmeshed utility structures. In these ldquolarge-scalerdquo games, players are inherently faced with limitations in both their observational and computational capabilities. Accordingly, players in large-scale games need to make their decisions using algorithms that accommodate limitations in information gathering and processing. This disqualifies some of the well known decision making models such as ldquoFictitious Playrdquo (FP), in which each player must monitor the individual actions of every other player and must optimize over a high dimensional probability space. We will show that Joint Strategy Fictitious Play (JSFP), a close variant of FP, alleviates both the informational and computational burden of FP. Furthermore, we introduce JSFP with inertia, i.e., a probabilistic reluctance to change strategies, and establish the convergence to a pure Nash equilibrium in all generalized ordinal potential games in both cases of averaged or exponentially discounted historical data. We illustrate JSFP with inertia on the specific class of congestion games, a subset of generalized ordinal potential games. In particular, we illustrate the main results on a distributed traffic routing problem and derive tolling procedures that can lead to optimized total traffic congestion

Caltech Authors

Learning Equilibria with Partial Information in Decentralized Wireless Networks

Author: Debbah Mérouane
Lasaulce Samson
Perlaza Samir M.
Rose Luca
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 14/06/2011
Field of study

In this article, a survey of several important equilibrium concepts for decentralized networks is presented. The term decentralized is used here to refer to scenarios where decisions (e.g., choosing a power allocation policy) are taken autonomously by devices interacting with each other (e.g., through mutual interference). The iterative long-term interaction is characterized by stable points of the wireless network called equilibria. The interest in these equilibria stems from the relevance of network stability and the fact that they can be achieved by letting radio devices to repeatedly interact over time. To achieve these equilibria, several learning techniques, namely, the best response dynamics, fictitious play, smoothed fictitious play, reinforcement learning algorithms, and regret matching, are discussed in terms of information requirements and convergence properties. Most of the notions introduced here, for both equilibria and learning schemes, are illustrated by a simple case study, namely, an interference channel with two transmitter-receiver pairs.Comment: 16 pages, 5 figures, 1 table. To appear in IEEE Communication Magazine, special Issue on Game Theor

arXiv.org e-Print Archive

Transmit without regrets: Online optimization in MIMO-OFDM cognitive radio systems

Author: Belmega E. Veronica
Mertikopoulos Panayotis
Publication venue
Publication date: 01/01/2014
Field of study

In this paper, we examine cognitive radio systems that evolve dynamically over time due to changing user and environmental conditions. To combine the advantages of orthogonal frequency division multiplexing (OFDM) and multiple-input, multiple-output (MIMO) technologies, we consider a MIMO-OFDM cognitive radio network where wireless users with multiple antennas communicate over several non-interfering frequency bands. As the network's primary users (PUs) come and go in the system, the communication environment changes constantly (and, in many cases, randomly). Accordingly, the network's unlicensed, secondary users (SUs) must adapt their transmit profiles "on the fly" in order to maximize their data rate in a rapidly evolving environment over which they have no control. In this dynamic setting, static solution concepts (such as Nash equilibrium) are no longer relevant, so we focus on dynamic transmit policies that lead to no regret: specifically, we consider policies that perform at least as well as (and typically outperform) even the best fixed transmit profile in hindsight. Drawing on the method of matrix exponential learning and online mirror descent techniques, we derive a no-regret transmit policy for the system's SUs which relies only on local channel state information (CSI). Using this method, the system's SUs are able to track their individually evolving optimum transmit profiles remarkably well, even under rapidly (and randomly) changing conditions. Importantly, the proposed augmented exponential learning (AXL) policy leads to no regret even if the SUs' channel measurements are subject to arbitrarily large observation errors (the imperfect CSI case), thus ensuring the method's robustness in the presence of uncertainties.Comment: 25 pages, 3 figures, to appear in the IEEE Journal on Selected Areas in Communication

arXiv.org e-Print Archive

Hal - Université Grenoble Alpes

INRIA a CCSD electronic archive server

Recommended from our members

xDelia final report: emotion-centred financial decision making and learning

Author: Adam Marc
Astor Philipp
Cederholm Henrik
Clough Gill
Conole Gráinne
Davies Gareth
Eriksson Jeanette
Fenton-O'Creevy Mark
Gaved Mark
Heuer Stephan
Jerčić Petar
Lindley Craig
Peffer Gilbert
Scanlon Eileen
Schaaff Kristina
Smidts Ale
Todd Lins Jeffrey
van Overveld Mark
Publication venue: Open University, CIMNE
Publication date: 01/01/2012
Field of study

xDelia is a 3-year pan-European project building on the knowledge, skills, and competences of seven partner organisations from a variety of research disciplines and from business. The principal objective of xDelia is to develop technology-enhanced learning approaches that help improve the financial decision making of investors who trade frequently using an electronic trading platform. We focus on emotions, and how they affect maladaptive decision biases and trading performance. Our earlier field work with traders has shown that the development of emotion regulation skills is a key facet of trader expertise. For that reason we consider expert traders our benchmark for adaptive behaviour rather than normative rationality. Our goal is to provide investors with the tools and techniques to develop greater self-awareness of internal states, increase their ability to reflect critically on emotion-informed choices, develop emotion management skills, and support the transfer of these skills to the real-world practice setting of financial trading. This report provides a comprehensive overview of what xDelia is about and what we have achieved over the life of the project. In the sections that follow, we explain the decision problems investors are faced with in a fast paced environment and the limitations of traditional approaches to reduce cognitive errors; introduce an alternative, technology-enhanced learning approach of diagnosis and feedback, skill development, and transfer; describe the learning intervention comprising twelve autonomous learning elements that we have developed; and present evidence from thirty-five studies we have conducted on learning effects and stakeholder acceptance

Open Research Online (The Open University)

If multi-agent learning is the answer, what is the question?

Author: Rob Powers
Trond Grenager
Yoav Shoham
Publication venue
Publication date
Field of study

Research Papers in Economics

Approachability in unknown games: Online learning meets multi-objective optimization

Author: Mannor Shie
Perchet Vianney
Stoltz Gilles
Publication venue
Publication date: 16/06/2016
Field of study

In the standard setting of approachability there are two players and a target set. The players play repeatedly a known vector-valued game where the first player wants to have the average vector-valued payoff converge to the target set which the other player tries to exclude it from this set. We revisit this setting in the spirit of online learning and do not assume that the first player knows the game structure: she receives an arbitrary vector-valued reward vector at every round. She wishes to approach the smallest ("best") possible set given the observed average payoffs in hindsight. This extension of the standard setting has implications even when the original target set is not approachable and when it is not obvious which expansion of it should be approached instead. We show that it is impossible, in general, to approach the best target set in hindsight and propose achievable though ambitious alternative goals. We further propose a concrete strategy to approach these goals. Our method does not require projection onto a target set and amounts to switching between scalar regret minimization algorithms that are performed in episodes. Applications to global cost minimization and to approachability under sample path constraints are considered

arXiv.org e-Print Archive

HAL-Polytechnique

Channel Selection for Network-assisted D2D Communication via No-Regret Bandit Learning with Calibrated Forecasting

Author: Maghsudi Setareh
Stanczak Slawomir
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 28/04/2014
Field of study

We consider the distributed channel selection problem in the context of device-to-device (D2D) communication as an underlay to a cellular network. Underlaid D2D users communicate directly by utilizing the cellular spectrum but their decisions are not governed by any centralized controller. Selfish D2D users that compete for access to the resources construct a distributed system, where the transmission performance depends on channel availability and quality. This information, however, is difficult to acquire. Moreover, the adverse effects of D2D users on cellular transmissions should be minimized. In order to overcome these limitations, we propose a network-assisted distributed channel selection approach in which D2D users are only allowed to use vacant cellular channels. This scenario is modeled as a multi-player multi-armed bandit game with side information, for which a distributed algorithmic solution is proposed. The solution is a combination of no-regret learning and calibrated forecasting, and can be applied to a broad class of multi-player stochastic learning problems, in addition to the formulated channel selection problem. Analytically, it is established that this approach not only yields vanishing regret (in comparison to the global optimal solution), but also guarantees that the empirical joint frequencies of the game converge to the set of correlated equilibria.Comment: 31 pages (one column), 9 figure

arXiv.org e-Print Archive

Fraunhofer-ePrints