1,030 research outputs found
Robustness Properties in Fictitious-Play-Type Algorithms
Fictitious play (FP) is a canonical game-theoretic learning algorithm which has been deployed extensively in decentralized control scenarios. However standard treatments of FP, and of many other game-theoretic models, assume rather idealistic conditions which rarely hold in realistic control scenarios. This paper considers a broad class of best response learning algorithms, that we refer to as FP-type algorithms. In such an algorithm, given some (possibly limited) information about the history of actions, each individual forecasts the future play and chooses a (myopic) best action given their forecast. We provide a unifed analysis of the behavior of FP-type algorithms under an important class of perturbations, thus demonstrating robustness to deviations from the idealistic operating conditions that have been previously assumed. This robustness result is then used to derive convergence results for two control-relevant relaxations of standard game-theoretic applications: distributed (network-based) implementation without full observability and asynchronous deployment (including in continuous time). In each case the results follow as a direct consequence of the main robustness result
On Robustness Properties in Empirical Centroid Fictitious Play
Empirical Centroid Fictitious Play (ECFP) is a generalization of the
well-known Fictitious Play (FP) algorithm designed for implementation in
large-scale games. In ECFP, the set of players is subdivided into equivalence
classes with players in the same class possessing similar properties. Players
choose a next-stage action by tracking and responding to aggregate statistics
related to each equivalence class. This setup alleviates the difficult task of
tracking and responding to the statistical behavior of every individual player,
as is the case in traditional FP. Aside from ECFP, many useful modifications
have been proposed to classical FP, e.g., rules allowing for network-based
implementation, increased computational efficiency, and stronger forms of
learning. Such modifications tend to be of great practical value; however,
their effectiveness relies heavily on two fundamental properties of FP:
robustness to alterations in the empirical distribution step size process, and
robustness to best-response perturbations. The main contribution of the paper
is to show that similar robustness properties also hold for the ECFP algorithm.
This result serves as a first step in enabling practical modifications to ECFP,
similar to those already developed for FP.Comment: Submitted for publication. Initial Submission: Mar. 201
Boosting Studies of Multi-Agent Reinforcement Learning on Google Research Football Environment: the Past, Present, and Future
Even though Google Research Football (GRF) was initially benchmarked and
studied as a single-agent environment in its original paper, recent years have
witnessed an increasing focus on its multi-agent nature by researchers
utilizing it as a testbed for Multi-Agent Reinforcement Learning (MARL).
However, the absence of standardized environment settings and unified
evaluation metrics for multi-agent scenarios hampers the consistent
understanding of various studies. Furthermore, the challenging 5-vs-5 and
11-vs-11 full-game scenarios have received limited thorough examination due to
their substantial training complexities. To address these gaps, this paper
extends the original environment by not only standardizing the environment
settings and benchmarking cooperative learning algorithms across different
scenarios, including the most challenging full-game scenarios, but also by
discussing approaches to enhance football AI from diverse perspectives and
introducing related research tools. Specifically, we provide a distributed and
asynchronous population-based self-play framework with diverse pre-trained
policies for faster training, two football-specific analytical tools for deeper
investigation, and an online leaderboard for broader evaluation. The overall
expectation of this work is to advance the study of Multi-Agent Reinforcement
Learning on Google Research Football environment, with the ultimate goal of
benefiting real-world sports beyond virtual games
Empirical Centroid Fictitious Play: An Approach For Distributed Learning In Multi-Agent Games
The paper is concerned with distributed learning in large-scale games. The
well-known fictitious play (FP) algorithm is addressed, which, despite
theoretical convergence results, might be impractical to implement in
large-scale settings due to intense computation and communication requirements.
An adaptation of the FP algorithm, designated as the empirical centroid
fictitious play (ECFP), is presented. In ECFP players respond to the centroid
of all players' actions rather than track and respond to the individual actions
of every player. Convergence of the ECFP algorithm in terms of average
empirical frequency (a notion made precise in the paper) to a subset of the
Nash equilibria is proven under the assumption that the game is a potential
game with permutation invariant potential function. A more general formulation
of ECFP is then given (which subsumes FP as a special case) and convergence
results are given for the class of potential games. Furthermore, a distributed
formulation of the ECFP algorithm is presented, in which, players endowed with
a (possibly sparse) preassigned communication graph, engage in local,
non-strategic information exchange to eventually agree on a common equilibrium.
Convergence results are proven for the distributed ECFP algorithm.Comment: Submitted to the IEEE Transactions on Signal Processin
The Crypto-democracy and the Trustworthy
In the current architecture of the Internet, there is a strong asymmetry in
terms of power between the entities that gather and process personal data
(e.g., major Internet companies, telecom operators, cloud providers, ...) and
the individuals from which this personal data is issued. In particular,
individuals have no choice but to blindly trust that these entities will
respect their privacy and protect their personal data. In this position paper,
we address this issue by proposing an utopian crypto-democracy model based on
existing scientific achievements from the field of cryptography. More
precisely, our main objective is to show that cryptographic primitives,
including in particular secure multiparty computation, offer a practical
solution to protect privacy while minimizing the trust assumptions. In the
crypto-democracy envisioned, individuals do not have to trust a single physical
entity with their personal data but rather their data is distributed among
several institutions. Together these institutions form a virtual entity called
the Trustworthy that is responsible for the storage of this data but which can
also compute on it (provided first that all the institutions agree on this).
Finally, we also propose a realistic proof-of-concept of the Trustworthy, in
which the roles of institutions are played by universities. This
proof-of-concept would have an important impact in demonstrating the
possibilities offered by the crypto-democracy paradigm.Comment: DPM 201
Distributed stochastic optimization via matrix exponential learning
In this paper, we investigate a distributed learning scheme for a broad class
of stochastic optimization problems and games that arise in signal processing
and wireless communications. The proposed algorithm relies on the method of
matrix exponential learning (MXL) and only requires locally computable gradient
observations that are possibly imperfect and/or obsolete. To analyze it, we
introduce the notion of a stable Nash equilibrium and we show that the
algorithm is globally convergent to such equilibria - or locally convergent
when an equilibrium is only locally stable. We also derive an explicit linear
bound for the algorithm's convergence speed, which remains valid under
measurement errors and uncertainty of arbitrarily high variance. To validate
our theoretical analysis, we test the algorithm in realistic
multi-carrier/multiple-antenna wireless scenarios where several users seek to
maximize their energy efficiency. Our results show that learning allows users
to attain a net increase between 100% and 500% in energy efficiency, even under
very high uncertainty.Comment: 31 pages, 3 figure
- âŠ