14 research outputs found
Approachability in Stackelberg Stochastic Games with Vector Costs
The notion of approachability was introduced by Blackwell [1] in the context
of vector-valued repeated games. The famous Blackwell's approachability theorem
prescribes a strategy for approachability, i.e., for `steering' the average
cost of a given agent towards a given target set, irrespective of the
strategies of the other agents. In this paper, motivated by the multi-objective
optimization/decision making problems in dynamically changing environments, we
address the approachability problem in Stackelberg stochastic games with vector
valued cost functions. We make two main contributions. Firstly, we give a
simple and computationally tractable strategy for approachability for
Stackelberg stochastic games along the lines of Blackwell's. Secondly, we give
a reinforcement learning algorithm for learning the approachable strategy when
the transition kernel is unknown. We also recover as a by-product Blackwell's
necessary and sufficient condition for approachability for convex sets in this
set up and thus a complete characterization. We also give sufficient conditions
for non-convex sets.Comment: 18 Pages, Submitted to Dynamic Games and Application
Approachability in Stackelberg Stochastic Games with Vector Costs
The notion of approachability was introduced by Blackwell [1] in the context
of vector-valued repeated games. The famous Blackwell's approachability theorem
prescribes a strategy for approachability, i.e., for `steering' the average
cost of a given agent towards a given target set, irrespective of the
strategies of the other agents. In this paper, motivated by the multi-objective
optimization/decision making problems in dynamically changing environments, we
address the approachability problem in Stackelberg stochastic games with vector
valued cost functions. We make two main contributions. Firstly, we give a
simple and computationally tractable strategy for approachability for
Stackelberg stochastic games along the lines of Blackwell's. Secondly, we give
a reinforcement learning algorithm for learning the approachable strategy when
the transition kernel is unknown. We also recover as a by-product Blackwell's
necessary and sufficient condition for approachability for convex sets in this
set up and thus a complete characterization. We also give sufficient conditions
for non-convex sets.Comment: 18 Pages, Submitted to Dynamic Games and Application
Adaptation, coordination, and local interactions via distributed approachability
This paper investigates the relation between cooperation, competition, and local interactions in large distributed multi-agent
systems. The main contribution is the game-theoretic problem formulation and solution approach based on the new framework
of distributed approachability, and the study of the convergence properties of the resulting game model. Approachability
theory is the theory of two-player repeated games with vector payoffs, and distributed approachability is here presented for
the first time as an extension to the case where we have a team of agents cooperating against a team of adversaries under local
information and interaction structure. The game model turns into a nonlinear differential inclusion, which after a proper design
of the control and disturbance policies, presents a consensus term and an exogenous adversarial input. Local interactions enter
in the model through a graph topology and the corresponding graph-Laplacian matrix. Given the above model, we turn the
original questions on cooperation, competition, and local interactions, into convergence properties of the differential inclusion.
In particular, we prove convergence and exponential convergence conditions around zero under general Markovian strategies.
We illustrate our results in the case of decentralized organizations with multiple decision-makers
Protocol Invariance and the Timing of Decisions in Dynamic Games
We characterize a class of dynamic stochastic games that we call separable dynamic games with noisy transitions and establish that these widely used models are protocol invariant provided that periods are sufficiently short. Protocol invariance means that the set of Markov perfect equilibria is nearly the same irrespective of the order in which players are assumed to move within a period. Protocol invariance can facilitate applied work and renders the implications and predictions of a model more robust. Our class of dynamic stochastic games includes investment games, R&D races, models of industry dynamics, dynamic public construction games, asynchronously repeated games, and many other models from the extant literature
Adaptive reinforcement learning for heterogeneous network selection
Next generation 5G mobile wireless networks will consist of multiple technologies for devices
to access the network at the edge. One of the keys to 5G is therefore the ability for
device to intelligently select its Radio Access Technology (RAT). Current fully distributed
algorithms for RAT selection although guaranteeing convergence to equilibrium states,
are often slow, require high exploration times and may converge to undesirable equilibria.
In this dissertation, we propose three novel reinforcement learning (RL) frameworks
to improve the efficiency of existing distributed RAT selection algorithms in a heterogeneous
environment, where users may potentially apply a number of different RAT selection
procedures. Although our research focuses on solutions for RAT selection in the
current and future mobile wireless networks, the proposed solutions in this dissertation
are general and suitable to apply for any large scale distributed multi-agent systems.
In the first framework, called RL with Non-positive Regret, we propose a novel adaptive
RL for multi-agent non-cooperative repeated games. The main contribution is to use both
positive and negative regrets in RL to improve the convergence speed and fairness of
the well-known regret-based RL procedure. Significant improvements in performance
compared to other related algorithms in the literature are demonstrated.
In the second framework, called RL with Network-Assisted Feedback (RLNF), our core
contribution is to develop a network feedback model that uses network-assisted information
to improve the performance of the distributed RL for RAT selection. RLNF guarantees
no-regret payoff in the long-run for any user adopting it, regardless of what other users
might do and so can work in an environment where not all users use the same learning
strategy. This is an important implementation advantage as RLNF can be implemented
within current mobile network standards.
In the third framework, we propose a novel adaptive RL-based mechanism for RAT selection
that can effectively handle user mobility. The key contribution is to leverage forgetting
methods to rapidly react to the changes in the radio conditions when users move.
We show that our solution improves the performance of wireless networks and converges
much faster when users move compared to the non-adaptive solutions. Another objective of the research is to study the impact of various network models on the
performance of different RAT selection approaches. We propose a unified benchmark to
compare the performances of different algorithms under the same computational environment.
The comparative studies reveal that among all the important network parameters
that influence the performance of RAT selection algorithms, the number of base stations
that a user can connect to has the most significant impact. This finding provides some
guidelines for the proper design of RAT selection algorithms for future 5G. Our evaluation
benchmark can serve as a reference for researchers, network developers, and engineers.
Overall, the thesis provides different reinforcement learning frameworks to improve the
efficiency of current fully distributed algorithms for heterogeneous RAT selection. We
prove the convergence of the proposed reinforcement learning procedures using the differential
inclusion (DI) technique. The theoretical analyses demonstrate that the use of
DI not only provides an effective method to study the convergence properties of adaptive
procedures in game-theoretic learning, but also yields a much more concise and extensible
proof as compared to the classical approaches.Thesis (Ph.D.) -- University of Adelaide, School of Electrical and Electronic Engineering, 201
Truthful Equilibria in Dynamic Bayesian Games
This paper characterizes an equilibrium payoff subset for Markovian games with private information as discounting vanishes. Monitoring is imperfect, transitions may depend on actions, types be correlated and values interdependent. The focus is on equilibria in which players report truthfully. The characterization generalizes that for repeated games, reducing the analysis to static Bayesian games with transfers. With correlated types, results from mechanism design apply, yielding a folk theorem. With independent private values, the restriction to truthful equilibria is without loss, except for the punishment level; if players withhold their information during punishment-like phases, a “folk” theorem obtains also
Computing Optimal Equilibria and Mechanisms via Learning in Zero-Sum Extensive-Form Games
We introduce a new approach for computing optimal equilibria and mechanisms via learning in games. It applies to extensive-form settings with any number of players, including mechanism design, information design, and solution concepts such as correlated, communication, and certification equilibria. We observe that optimal equilibria are minimax equilibrium strategies of a player in an extensive-form zero-sum game. This reformulation allows us to apply techniques for learning in zero-sum games, yielding the first learning dynamics that converge to optimal equilibria, not only in empirical averages, but also in iterates. We demonstrate the practical scalability and flexibility of our approach by attaining state-of-the-art performance in benchmark tabular games, and by computing an optimal mechanism for a sequential auction design problem using deep reinforcement learning
Game Theoretic Models for Power Control in Wireless Networks
Τα τελευταία χρόνια, η τεχνολογία των κινητών επικοινωνιών έχει εξελιχθεί
ραγδαία εξαιτίας των αυξανόμενων απαιτήσεων των χρηστών, όπως είναι η πρόσβαση
σε υπηρεσίες του Διαδικτύου μέσω των ασύρματων έξυπνων κινητών συσκευών καθώς
και οι απαιτήσεις σε καλύτερη ποιότητα στις προσφερόμενες υπηρεσίες. Σήμερα, οι
συσκεύες αυτές χρησιμοποιούν την τέταρτη γενιά δικτύων (4Gή LTE) καθώς
αντικαθιστάτην τρίτη γενιά δικτύων (3G) και προσφέρει στους χρήστες βελτιωμένες
υπηρεσίες με υψηλότερες ταχύτητες. Οι ασύρματες έξυπνες κινητές συσκευές
(smartphones) έχουν μεγάλη ζήτηση στην αγορά, για το λόγο αυτό γίνεται
προσπάθεια να εξελιγχθούν σε επίπεδο ενεργειακής κατανάλωσης, ώστε ο χρήστης να
μην χρειάζεται να επαναφορτίζει τη συσκευή του σε τακτά χρονικά διαστήματα. Η
θεωρία παιγνίων παρέχει πολύτιμα μαθηματικά εργαλεία όπου μπορούν να
χρησιμοποιηθούν για την επίλυση των διαφόρων προβλημάτων που αντιμετώπιζουν τα
ασύρματα δίκτυα.
Στην παρούσα διπλωματική εργασία μελετάτε το πρόβλημα του ελέγχου ισχύος
εκπομπής (powercontrol). Συγκεκριμένα, μελετάμε παιγνιοθεωρητικά μοντέλα για
έλεγχο ισχύος εκπομπής σε ασύρματα δίκτυα (CDMA& LTE). Η μελέτη μας
επικεντρώνεται στα μη συνεργατικά παίγνια και υποθέτουμε ότι οι χρήστες του
δικτύου (αποστολείς, παραλείπτες) είναι εγωιστές και ορθολογιστές. Στη
συνέχεια, εισάγουμε αλγορίθμους μάθησης, regretlearningalgorithms, καθώς και
την σύνδεση τους με την θεωρία παιγνίων. Τέλος, ερευνάμε τις διάφορες
regretlearningτεχνικές εφαρμόζοντάς τες στο πρόβλημα του powercontrolστα
ασύρματα δίκτυα επόμενης γενιάς.In recent years, the technology of mobile communications has evolved rapidly
due to increasing requirements, such as access to Internet services via mobile
phones and requirements better quality services. Nowadays, the devices use the
Long Term Evolution (LTE), which called also as 4G networks. The fourth
generation (4G) networks replace the third networks generation (3G) and offer
to users improved services at higher speeds. Mobile devices to access the
Internet, such as smartphones, tablet PCs and netbooks are in high demand in
the market for it is an effort to develop in energy consumption level, that the
user does not need recharge the device at regular time intervals. Game theory
provides valuable mathematical tools that can be used to solve problems of
wireless communication networks and can be applied to multiple layers of
wireless networks.
In this thesis, we study power control issue and consider it at the physical
layer of wireless networks. Specifically, we study game theoretic models for
power control in wireless communication networks (CDMA & LTE). In the game
theory, we have focused in the non-cooperative power control games and assumed
that both transmitters and receivers are selfish and rational. In addition, we
insert regret learning techniques and their connection with the game theory.
Finally, we investigate the regret learning techniques applied to the problem
of power control in the next generation networks
Multi-Agent Systems
A multi-agent system (MAS) is a system composed of multiple interacting intelligent agents. Multi-agent systems can be used to solve problems which are difficult or impossible for an individual agent or monolithic system to solve. Agent systems are open and extensible systems that allow for the deployment of autonomous and proactive software components. Multi-agent systems have been brought up and used in several application domains