Search CORE

1,240 research outputs found

Reinforcement Learning in Deep Structured Teams: Initial Results with Finite and Infinite Valued Features

Author: Aghdam Amir G.
Arabneydi Jalal
Roudneshin Masoud
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 06/02/2021
Field of study

In this paper, we consider Markov chain and linear quadratic models for deep structured teams with discounted and time-average cost functions under two non-classical information structures, namely, deep state sharing and no sharing. In deep structured teams, agents are coupled in dynamics and cost functions through deep state, where deep state refers to a set of orthogonal linear regressions of the states. In this article, we consider a homogeneous linear regression for Markov chain models (i.e., empirical distribution of states) and a few orthonormal linear regressions for linear quadratic models (i.e., weighted average of states). Some planning algorithms are developed for the case when the model is known, and some reinforcement learning algorithms are proposed for the case when the model is not known completely. The convergence of two model-free (reinforcement learning) algorithms, one for Markov chain models and one for linear quadratic models, is established. The results are then applied to a smart grid.Comment: This version corrects some typographical error

arXiv.org e-Print Archive

Crossref

Weighted Age of Information based Scheduling for Large Population Games on Networks

Author: Aggarwal Shubham
Bastopcu Melih
Başar Tamer
Zaman Muhammad Aneeq uz
Publication venue
Publication date: 26/12/2022
Field of study

In this paper, we consider a discrete-time multi-agent system involving

N

cost-coupled networked rational agents solving a consensus problem and a central Base Station (BS), scheduling agent communications over a network. Due to a hard bandwidth constraint on the number of transmissions through the network, at most

R_d < N

agents can concurrently access their state information through the network. Under standard assumptions on the information structure of the agents and the BS, we first show that the control actions of the agents are free of any dual effect, allowing for separation between estimation and control problems at each agent. Next, we propose a weighted age of information (WAoI) metric for the scheduling problem of the BS, where the weights depend on the estimation error of the agents. The BS aims to find the optimum scheduling policy that minimizes the WAoI, subject to the hard bandwidth constraint. Since this problem is NP hard, we first relax the hard constraint to a soft update rate constraint, and then compute an optimal policy for the relaxed problem by reformulating it into a Markov Decision Process (MDP). This then inspires a sub-optimal policy for the bandwidth constrained problem, which is shown to approach the optimal policy as

N \rightarrow \infty

. Next, we solve the consensus problem using the mean-field game framework wherein we first design decentralized control policies for a limiting case of the

N

-agent system (as

N \rightarrow \infty

). By explicitly constructing the mean-field system, we prove the existence and uniqueness of the mean-field equilibrium. Consequently, we show that the obtained equilibrium policies constitute an

\epsilon

-Nash equilibrium for the finite agent system. Finally, we validate the performance of both the scheduling and the control policies through numerical simulations.Comment: This work has been submitted to IEEE for possible publicatio

arXiv.org e-Print Archive

Rank Centrality: Ranking from Pair-wise Comparisons

Author: Negahban Sahand
Oh Sewoong
Shah Devavrat
Publication venue
Publication date: 01/01/2014
Field of study

The question of aggregating pair-wise comparisons to obtain a global ranking over a collection of objects has been of interest for a very long time: be it ranking of online gamers (e.g. MSR's TrueSkill system) and chess players, aggregating social opinions, or deciding which product to sell based on transactions. In most settings, in addition to obtaining a ranking, finding `scores' for each object (e.g. player's rating) is of interest for understanding the intensity of the preferences. In this paper, we propose Rank Centrality, an iterative rank aggregation algorithm for discovering scores for objects (or items) from pair-wise comparisons. The algorithm has a natural random walk interpretation over the graph of objects with an edge present between a pair of objects if they are compared; the score, which we call Rank Centrality, of an object turns out to be its stationary probability under this random walk. To study the efficacy of the algorithm, we consider the popular Bradley-Terry-Luce (BTL) model (equivalent to the Multinomial Logit (MNL) for pair-wise comparisons) in which each object has an associated score which determines the probabilistic outcomes of pair-wise comparisons between objects. In terms of the pair-wise marginal probabilities, which is the main subject of this paper, the MNL model and the BTL model are identical. We bound the finite sample error rates between the scores assumed by the BTL model and those estimated by our algorithm. In particular, the number of samples required to learn the score well with high probability depends on the structure of the comparison graph. When the Laplacian of the comparison graph has a strictly positive spectral gap, e.g. each item is compared to a subset of randomly chosen items, this leads to dependence on the number of samples that is nearly order-optimal.Comment: 45 pages, 3 figure

arXiv.org e-Print Archive

DSpace@MIT

Nonlinear Markov Games on a Finite State Space (Mean-field and Binary Interactions)

Author
Publication venue: 'Canadian Center of Science and Education'
Publication date
Field of study

Crossref