1,240 research outputs found
Reinforcement Learning in Deep Structured Teams: Initial Results with Finite and Infinite Valued Features
In this paper, we consider Markov chain and linear quadratic models for deep
structured teams with discounted and time-average cost functions under two
non-classical information structures, namely, deep state sharing and no
sharing. In deep structured teams, agents are coupled in dynamics and cost
functions through deep state, where deep state refers to a set of orthogonal
linear regressions of the states. In this article, we consider a homogeneous
linear regression for Markov chain models (i.e., empirical distribution of
states) and a few orthonormal linear regressions for linear quadratic models
(i.e., weighted average of states). Some planning algorithms are developed for
the case when the model is known, and some reinforcement learning algorithms
are proposed for the case when the model is not known completely. The
convergence of two model-free (reinforcement learning) algorithms, one for
Markov chain models and one for linear quadratic models, is established. The
results are then applied to a smart grid.Comment: This version corrects some typographical error
Weighted Age of Information based Scheduling for Large Population Games on Networks
In this paper, we consider a discrete-time multi-agent system involving
cost-coupled networked rational agents solving a consensus problem and a
central Base Station (BS), scheduling agent communications over a network. Due
to a hard bandwidth constraint on the number of transmissions through the
network, at most agents can concurrently access their state
information through the network. Under standard assumptions on the information
structure of the agents and the BS, we first show that the control actions of
the agents are free of any dual effect, allowing for separation between
estimation and control problems at each agent. Next, we propose a weighted age
of information (WAoI) metric for the scheduling problem of the BS, where the
weights depend on the estimation error of the agents. The BS aims to find the
optimum scheduling policy that minimizes the WAoI, subject to the hard
bandwidth constraint. Since this problem is NP hard, we first relax the hard
constraint to a soft update rate constraint, and then compute an optimal policy
for the relaxed problem by reformulating it into a Markov Decision Process
(MDP). This then inspires a sub-optimal policy for the bandwidth constrained
problem, which is shown to approach the optimal policy as . Next, we solve the consensus problem using the mean-field game
framework wherein we first design decentralized control policies for a limiting
case of the -agent system (as ). By explicitly
constructing the mean-field system, we prove the existence and uniqueness of
the mean-field equilibrium. Consequently, we show that the obtained equilibrium
policies constitute an -Nash equilibrium for the finite agent system.
Finally, we validate the performance of both the scheduling and the control
policies through numerical simulations.Comment: This work has been submitted to IEEE for possible publicatio
Rank Centrality: Ranking from Pair-wise Comparisons
The question of aggregating pair-wise comparisons to obtain a global ranking
over a collection of objects has been of interest for a very long time: be it
ranking of online gamers (e.g. MSR's TrueSkill system) and chess players,
aggregating social opinions, or deciding which product to sell based on
transactions. In most settings, in addition to obtaining a ranking, finding
`scores' for each object (e.g. player's rating) is of interest for
understanding the intensity of the preferences.
In this paper, we propose Rank Centrality, an iterative rank aggregation
algorithm for discovering scores for objects (or items) from pair-wise
comparisons. The algorithm has a natural random walk interpretation over the
graph of objects with an edge present between a pair of objects if they are
compared; the score, which we call Rank Centrality, of an object turns out to
be its stationary probability under this random walk. To study the efficacy of
the algorithm, we consider the popular Bradley-Terry-Luce (BTL) model
(equivalent to the Multinomial Logit (MNL) for pair-wise comparisons) in which
each object has an associated score which determines the probabilistic outcomes
of pair-wise comparisons between objects. In terms of the pair-wise marginal
probabilities, which is the main subject of this paper, the MNL model and the
BTL model are identical. We bound the finite sample error rates between the
scores assumed by the BTL model and those estimated by our algorithm. In
particular, the number of samples required to learn the score well with high
probability depends on the structure of the comparison graph. When the
Laplacian of the comparison graph has a strictly positive spectral gap, e.g.
each item is compared to a subset of randomly chosen items, this leads to
dependence on the number of samples that is nearly order-optimal.Comment: 45 pages, 3 figure
- …