112 research outputs found

    Modeling Mutual Influence in Multi-Agent Reinforcement Learning

    Get PDF
    In multi-agent systems (MAS), agents rarely act in isolation but tend to achieve their goals through interactions with other agents. To be able to achieve their ultimate goals, individual agents should actively evaluate the impacts on themselves of other agents' behaviors before they decide which actions to take. The impacts are reciprocal, and it is of great interest to model the mutual influence of agent's impacts with one another when they are observing the environment or taking actions in the environment. In this thesis, assuming that the agents are aware of each other's existence and their potential impact on themselves, I develop novel multi-agent reinforcement learning (MARL) methods that can measure the mutual influence between agents to shape learning. The first part of this thesis outlines the framework of recursive reasoning in deep multi-agent reinforcement learning. I hypothesize that it is beneficial for each agent to consider how other agents react to their behavior. I start from Probabilistic Recursive Reasoning (PR2) using level-1 reasoning and adopt variational Bayes methods to approximate the opponents' conditional policies. Each agent shapes the individual Q-value by marginalizing the conditional policies in the joint Q-value and finding the best response to improving their policies. I further extend PR2 to Generalized Recursive Reasoning (GR2) with different hierarchical levels of rationality. GR2 enables agents to possess various levels of thinking ability, thereby allowing higher-level agents to best respond to less sophisticated learners. The first part of the thesis shows that eliminating the joint Q-value to an individual Q-value via explicitly recursive reasoning would benefit the learning. In the second part of the thesis, in reverse, I measure the mutual influence by approximating the joint Q-value based on the individual Q-values. I establish Q-DPP, an extension of the Determinantal Point Process (DPP) with partition constraints, and apply it to multi-agent learning as a function approximator for the centralized value function. An attractive property of using Q-DPP is that when it reaches the optimum value, it can offer a natural factorization of the centralized value function, representing both quality (maximizing reward) and diversity (different behaviors). In the third part of the thesis, I depart from the action-level mutual influence and build a policy-space meta-game to analyze agents' relationship between adaptive policies. I present a Multi-Agent Trust Region Learning (MATRL) algorithm that augments single-agent trust region policy optimization with a weak stable fixed point approximated by the policy-space meta-game. The algorithm aims to find a game-theoretic mechanism to adjust the policy optimization steps that force the learning of all agents toward the stable point

    The influence of topology and information diffusion on networked game dynamics

    Get PDF
    This thesis studies the influence of topology and information diffusion on the strategic interactions of agents in a population. It shows that there exists a reciprocal relationship between the topology, information diffusion and the strategic interactions of a population of players. In order to evaluate the influence of topology and information flow on networked game dynamics, strategic games are simulated on populations of players where the players are distributed in a non-homogeneous spatial arrangement. The initial component of this research consists of a study of evolution of the coordination of strategic players, where the topology or the structure of the population is shown to be critical in defining the coordination among the players. Next, the effect of network topology on the evolutionary stability of strategies is studied in detail. Based on the results obtained, it is shown that network topology plays a key role in determining the evolutionary stability of a particular strategy in a population of players. Then, the effect of network topology on the optimum placement of strategies is studied. Using genetic optimisation, it is shown that the placement of strategies in a spatially distributed population of players is crucial in maximising the collective payoff of the population. Exploring further the effect of network topology and information diffusion on networked games, the non-optimal or bounded rationality of players is modelled using topological and directed information flow of the network. Based on the topologically distributed bounded rationality model, it is shown that the scale-free and small-world networks emerge in randomly connected populations of sub-optimal players. Thus, the topological and information theoretic interpretations of bounded rationality suggest the topology, information diffusion and the strategic interactions of socio-economical structures are cyclically interdependent

    The influence of topology and information diffusion on networked game dynamics

    Get PDF
    This thesis studies the influence of topology and information diffusion on the strategic interactions of agents in a population. It shows that there exists a reciprocal relationship between the topology, information diffusion and the strategic interactions of a population of players. In order to evaluate the influence of topology and information flow on networked game dynamics, strategic games are simulated on populations of players where the players are distributed in a non-homogeneous spatial arrangement. The initial component of this research consists of a study of evolution of the coordination of strategic players, where the topology or the structure of the population is shown to be critical in defining the coordination among the players. Next, the effect of network topology on the evolutionary stability of strategies is studied in detail. Based on the results obtained, it is shown that network topology plays a key role in determining the evolutionary stability of a particular strategy in a population of players. Then, the effect of network topology on the optimum placement of strategies is studied. Using genetic optimisation, it is shown that the placement of strategies in a spatially distributed population of players is crucial in maximising the collective payoff of the population. Exploring further the effect of network topology and information diffusion on networked games, the non-optimal or bounded rationality of players is modelled using topological and directed information flow of the network. Based on the topologically distributed bounded rationality model, it is shown that the scale-free and small-world networks emerge in randomly connected populations of sub-optimal players. Thus, the topological and information theoretic interpretations of bounded rationality suggest the topology, information diffusion and the strategic interactions of socio-economical structures are cyclically interdependent

    Quantum coherence and correlations in photonic qubits and photoactive hybrid organometallic Perovskite systems

    Get PDF
    Las últimas dos décadas han sido testigos de tremendos avances y desarrollos en la ciencia y tecnología de la información cuántica, debido principalmente al uso de recursos físico-cuánticos, tales como la coherencia y el entrelazamiento. La formalización del concepto de computación cuántica universal por D. Deutsch en 1985 ha madurado hacia iniciativas comerciales que apuntan a acelerar la implementación física de una computadora cuántica práctica. Hasta ahora, dicha tecnología cuántica se ha impulsado como procesadores de información que se basan principalmente en bits cuánticos (qubits) superconductores. Otros desarrollos hacen uso de los estados cuánticos de los fotones en conjunción con otros registradores cuánticos basados en electrones, átomos, moléculas, sistemas artificiales, entre otros. Aunque, para cualquier caso, las tecnologías de qubit múltiples aún están bajo intensa investigación y desarrollos en los que la temperatura y el tamaño del registrador cuántico son cuestiones esenciales. Sin embargo, toda posible implementación física de dispositivos de procesamiento de información cuántica tiene en común las propiedades fundamentales de los sistemas cuánticos: interferencia, coherencia y entrelazamiento. En esta tesis, tratamos el estudio de la coherencia cuántica y el entrelazamiento en qubits (codificados en base a la polarización) y los materiales cuánticos (que operan a temperatura ambiente) para analizar el papel de las correlaciones cuánticas y la decoherencia para fines de procesamiento de información. La investigación actual se divide en dos partes principales: la primera comienza con el análisis de la influencia de un medio birrefringente sobre el entrelazamiento de un estado de qubit fotónico. Empleamos una fibra que mantiene la polarización (PMF) como un entorno de decoherencia para probar el modelo teórico en el que la simetría del acoplamiento entre el entorno y el qubit define la muerte y reactivación de las correlaciones de información. Este hallazgo establece una herramienta para mantener el entrelazamiento independiente de la longitud de la fibra empleando las propiedades de simetría del sistema físico. Como complemento, para demostrar que el entrelazamiento no es el único factor crucial en los esquemas de información, empleamos el juego del dilema del prisionero (en un espacio de estrategia de dos parámetros y extendido hasta tres) para demostrar que las ventajas cuánticas en este protocolo son debido a la superposición cuántica en lugar del entrelazamiento del sistema físico. Aquí también presentamos una configuración experimental con fotones para verificar nuestros hallazgos con qubits fotónicos. La segunda parte de la tesis examina un nuevo nanomaterial que podría servir de puente para la interacción de fotones y electrones hacia una representación física de qubits fotónicos condicionados mediante registros externos (como electrones o iones). La primera etapa en esta dirección considera la implementación de emisores de fotones individuales. Sin embargo, antes de esto es necesario reconocer las capacidades fotofísicas de las Perovskites como estructura seleccionada. Como complemento, la novedad de esta nanoestructura nos permite dar respuestas a algunas preguntas abiertas en esta dirección de caracterización en el marco de esta investigación. Las perovskitas de MA-haluro (metilamonio-haluro) están estructuralmente formadas por pequeños dominios que van desde tamaños de nano a micrómetros y que presentan una fuerte intermitencia de fotoluminiscencia (Blinking). Atribuimos esta respuesta a la recombinación noradiativa de Auger de electrones adicionales fotogenerados en un proceso de relleno de trampas ocasionadas por defectos o móvimiento de iones. Esto lo verificamos mediante la aplicación de una capa absorbente (quencher) de cargas de PCBM (fenil-éster metílico del ácido butírico C61), lo que resultó en una disminución considerable del parpadeo. También proporcionamos una técnica novedosa para la observación en tiempo real del efecto del movimiento de iones sobre este material. Adicionalmente, este comportamiento indica condiciones adecuadas para la existencia de pocos emisores que son útiles en protocolos cuánticos y se convierten en una motivación para analizar y explicar este fenómeno. Luego, preparamos dos composiciones químicas de perovskitas inorgánicas con un alto orden estructural para explorar las condiciones necesarias para producir un emisor cuántico con este material. Visualizamos las muestras a través de microscopía electrónica, y ambos sistemas se caracterizaron espectralmente para disminuir las diluciones hasta observar la intermitencia de la emisión. Se verificó el comportamiento no clásico de la emisión empleando un interferómetro Hanbury-Brown y Twiss mediante la medición del grado de función de correlación de coherencia de segundo orden. Finalmente, pero no menos importante, la información adicional presenta los laboratorios implementados durante esta investigación doctoral y el prototipo de un dispositivo de conteo de señales para analizar las coincidencias temporales en eventos fotónicos correlacionados

    Generalized asset integrity games

    Get PDF
    Generalized assets represent a class of multi-scale adaptive state-transition systems with domain-oblivious performance criteria. The governance of such assets must proceed without exact specifications, objectives, or constraints. Decision making must rapidly scale in the presence of uncertainty, complexity, and intelligent adversaries. This thesis formulates an architecture for generalized asset planning. Assets are modelled as dynamical graph structures which admit topological performance indicators, such as dependability, resilience, and efficiency. These metrics are used to construct robust model configurations. A normalized compression distance (NCD) is computed between a given active/live asset model and a reference configuration to produce an integrity score. The utility derived from the asset is monotonically proportional to this integrity score, which represents the proximity to ideal conditions. The present work considers the situation between an asset manager and an intelligent adversary, who act within a stochastic environment to control the integrity state of the asset. A generalized asset integrity game engine (GAIGE) is developed, which implements anytime algorithms to solve a stochastically perturbed two-player zero-sum game. The resulting planning strategies seek to stabilize deviations from minimax trajectories of the integrity score. Results demonstrate the performance and scalability of the GAIGE. This approach represents a first-step towards domain-oblivious architectures for complex asset governance and anytime planning

    Yearbook 2021 (Institute of Technical Physics and Materials Science)

    Get PDF

    International Conference on Continuous Optimization (ICCOPT) 2019 Conference Book

    Get PDF
    The Sixth International Conference on Continuous Optimization took place on the campus of the Technical University of Berlin, August 3-8, 2019. The ICCOPT is a flagship conference of the Mathematical Optimization Society (MOS), organized every three years. ICCOPT 2019 was hosted by the Weierstrass Institute for Applied Analysis and Stochastics (WIAS) Berlin. It included a Summer School and a Conference with a series of plenary and semi-plenary talks, organized and contributed sessions, and poster sessions. This book comprises the full conference program. It contains, in particular, the scientific program in survey style as well as with all details, and information on the social program, the venue, special meetings, and more

    Reinforcement Learning

    Get PDF
    Brains rule the world, and brain-like computation is increasingly used in computers and electronic devices. Brain-like computation is about processing and interpreting data or directly putting forward and performing actions. Learning is a very important aspect. This book is on reinforcement learning which involves performing actions to achieve a goal. The first 11 chapters of this book describe and extend the scope of reinforcement learning. The remaining 11 chapters show that there is already wide usage in numerous fields. Reinforcement learning can tackle control tasks that are too complex for traditional, hand-designed, non-learning controllers. As learning computers can deal with technical complexities, the tasks of human operators remain to specify goals on increasingly higher levels. This book shows that reinforcement learning is a very dynamic area in terms of theory and applications and it shall stimulate and encourage new research in this field
    corecore