1,240 research outputs found

    Reinforcement Learning in Deep Structured Teams: Initial Results with Finite and Infinite Valued Features

    Full text link
    In this paper, we consider Markov chain and linear quadratic models for deep structured teams with discounted and time-average cost functions under two non-classical information structures, namely, deep state sharing and no sharing. In deep structured teams, agents are coupled in dynamics and cost functions through deep state, where deep state refers to a set of orthogonal linear regressions of the states. In this article, we consider a homogeneous linear regression for Markov chain models (i.e., empirical distribution of states) and a few orthonormal linear regressions for linear quadratic models (i.e., weighted average of states). Some planning algorithms are developed for the case when the model is known, and some reinforcement learning algorithms are proposed for the case when the model is not known completely. The convergence of two model-free (reinforcement learning) algorithms, one for Markov chain models and one for linear quadratic models, is established. The results are then applied to a smart grid.Comment: This version corrects some typographical error

    Weighted Age of Information based Scheduling for Large Population Games on Networks

    Full text link
    In this paper, we consider a discrete-time multi-agent system involving NN cost-coupled networked rational agents solving a consensus problem and a central Base Station (BS), scheduling agent communications over a network. Due to a hard bandwidth constraint on the number of transmissions through the network, at most Rd<NR_d < N agents can concurrently access their state information through the network. Under standard assumptions on the information structure of the agents and the BS, we first show that the control actions of the agents are free of any dual effect, allowing for separation between estimation and control problems at each agent. Next, we propose a weighted age of information (WAoI) metric for the scheduling problem of the BS, where the weights depend on the estimation error of the agents. The BS aims to find the optimum scheduling policy that minimizes the WAoI, subject to the hard bandwidth constraint. Since this problem is NP hard, we first relax the hard constraint to a soft update rate constraint, and then compute an optimal policy for the relaxed problem by reformulating it into a Markov Decision Process (MDP). This then inspires a sub-optimal policy for the bandwidth constrained problem, which is shown to approach the optimal policy as N→∞N \rightarrow \infty. Next, we solve the consensus problem using the mean-field game framework wherein we first design decentralized control policies for a limiting case of the NN-agent system (as N→∞N \rightarrow \infty). By explicitly constructing the mean-field system, we prove the existence and uniqueness of the mean-field equilibrium. Consequently, we show that the obtained equilibrium policies constitute an ϵ\epsilon-Nash equilibrium for the finite agent system. Finally, we validate the performance of both the scheduling and the control policies through numerical simulations.Comment: This work has been submitted to IEEE for possible publicatio

    Rank Centrality: Ranking from Pair-wise Comparisons

    Full text link
    The question of aggregating pair-wise comparisons to obtain a global ranking over a collection of objects has been of interest for a very long time: be it ranking of online gamers (e.g. MSR's TrueSkill system) and chess players, aggregating social opinions, or deciding which product to sell based on transactions. In most settings, in addition to obtaining a ranking, finding `scores' for each object (e.g. player's rating) is of interest for understanding the intensity of the preferences. In this paper, we propose Rank Centrality, an iterative rank aggregation algorithm for discovering scores for objects (or items) from pair-wise comparisons. The algorithm has a natural random walk interpretation over the graph of objects with an edge present between a pair of objects if they are compared; the score, which we call Rank Centrality, of an object turns out to be its stationary probability under this random walk. To study the efficacy of the algorithm, we consider the popular Bradley-Terry-Luce (BTL) model (equivalent to the Multinomial Logit (MNL) for pair-wise comparisons) in which each object has an associated score which determines the probabilistic outcomes of pair-wise comparisons between objects. In terms of the pair-wise marginal probabilities, which is the main subject of this paper, the MNL model and the BTL model are identical. We bound the finite sample error rates between the scores assumed by the BTL model and those estimated by our algorithm. In particular, the number of samples required to learn the score well with high probability depends on the structure of the comparison graph. When the Laplacian of the comparison graph has a strictly positive spectral gap, e.g. each item is compared to a subset of randomly chosen items, this leads to dependence on the number of samples that is nearly order-optimal.Comment: 45 pages, 3 figure

    Nonlinear Markov Games on a Finite State Space (Mean-field and Binary Interactions)

    Full text link
    • …
    corecore