Information theory in multi-agent learning systems

Abstract

En este trabajo se muestra el desarrollo de un modelo de aprendizaje multi-agente que se basa en conceptos tales como la teoría de la información y la teoría de juegos. En primer lugar, los agentes, por medio de herramientas tales como la máxima entropía o las regresiones Gaussianas obtienen una descripción sobre su ambiente. Posteriormente, a través de la minimización de la información mutua, éstos captan la información suficiente y necesaria para entender su entorno evitando redundancia o elevados niveles de distorsión, y de ésta manera, definen fronteras de racionalidad que les permiten, además de mejorar el proceso de apendizaje, seguir señales relevantes del ambiente o evitarlas si su utilidad no es importante. Finalmente, bajo un esquema de juegos potenciales, se define una función potencial basada en la distorsión que es minimizada para lograr un equilibrio de Nash. El modelo propuesto es implementado en redes de sensores móviles y en el control secundario de voltaje en una micro red, demostrando excelentes resultados en términos de control distribuído.In this work, we propose a multi-agent learning framework based on the mutual information between the agents and their environment. Initially, each agent, based on its neighborhood information, uses the Gaussian process regression (GPR) to infer the environment behavior. Then, a minimization of the mutual information between an agent and the environment is calculated by means of the rate distortion function (RDF). In this way, a border between misunderstanding and redundancy of the environment information is obtained, which is used as a decision rule by the agents. The calculation of the RDF is conveniently performed through the Blahut-Arimoto algorithm, from which, the most important elements for our model are the Lagrange multiplier s, and the conditional distribution describing the similitude between the agent and the environment. The parameter s plays an important role in the rationality level assumed by the agents in the decision making process. On the other hand, due to its Boltzmann distribution form, the conditional probability distribution establishes a Logit dynamics pattern, used by the agents as a rule for the action selection. Finally, we include a distributed optimization setting by means of the potential games approach, in which the Nash equilibrium convergence is found through a distortion based potential function. The framework, in spite of being mainly implemented in mobile sensor networks, demonstrates applicability in other multi-agent contexts, such as smart grids.COLCIENCIAS, FunCyTCADoctorad

    Similar works