11 research outputs found

    Opponent awareness at all levels of the multiagent reinforcement learning stack

    Get PDF
    Multiagent Reinforcement Learning (MARL) has experienced numerous high profile successes in recent years in terms of generating superhuman gameplaying agents for a wide variety of videogames. Despite these successes, MARL techniques have failed to be adopted by game developers as a useful tool to be used when developing their games, often citing the high computational cost associated with training agents alongside the difficulty of understanding and evaluating MARL methods as the two main obstacles. This thesis attempts to close this gap by introducing an informative modular abstraction under which any Reinforcement Learning (RL) training pipeline can be studied. This is defined as the MARL stack, which explicitly expresses any MARL pipeline as an environment where agents equipped with learning algorithms train via simulated experience as orchestrated by a training scheme. Within the context of 2-player zero-sum games, different approaches at granting opponent awareness at all levels of the proposed MARL stack are explored in broad study of the field. At the level of training schemes, a grouping generalization over many modern MARL training schemes is introduced under a unified framework. Empirical results are shown which demonstrate that the decision over which sequence of opponents a learning agent will face during training greatly affects learning dynamics. At the agent level, the introduction of opponent modelling in state-of-the art algorithms is explored as a way of generating targeted best responses towards opponents encountered during training, improving upon the sample efficiency of these methods. At the environment level the use of MARL as a game design tool is explored by using MARL trained agents as metagame evaluators inside an automated process of game balancing

    Algorithms for Adaptive Game-playing Agents

    Get PDF

    MAPiS 2019 - First MAP-i Seminar: proceedings

    Get PDF
    This book contains a selection of Informatics papers accepted for presentation and discussion at “MAPiS 2019 - First MAP-i Seminar”, held in Aveiro, Portugal, January 31, 2019. MAPiS is the first conference organized by the MAP-i first year students, in the context of the Seminar course. The MAP-i Doctoral Programme in Computer Science is a joint Doctoral Programme in Computer Science of the University of Minho, the University of Aveiro and the University of Porto. This programme aims to form highly-qualified professionals, fostering their capacity and knowledge to the research area. This Conference was organized by the first grade students attending the Seminar Course. The aim of the course was to introduce concepts which are complementary to scientific and technological education, but fundamental to both completing a PhD successfully and entailing a career on scientific research. The students had contact with the typical procedures and difficulties of organizing and participate in such a complex event. These students were in charge of the organization and management of all the aspects of the event, such as the accommodation of participants or revision of the papers. The works presented in the Conference and the papers submitted were also developed by these students, fomenting their enthusiasm regarding the investigation in the Informatics area. (...)publishe

    Leveraging deep reinforcement learning in the smart grid environment

    Full text link
    L’apprentissage statistique moderne démontre des résultats impressionnants, où les or- dinateurs viennent à atteindre ou même à excéder les standards humains dans certaines applications telles que la vision par ordinateur ou les jeux de stratégie. Pourtant, malgré ces avancées, force est de constater que les applications fiables en déploiement en sont encore à leur état embryonnaire en comparaison aux opportunités qu’elles pourraient apporter. C’est dans cette perspective, avec une emphase mise sur la théorie de décision séquentielle et sur les recherches récentes en apprentissage automatique, que nous démontrons l’applica- tion efficace de ces méthodes sur des cas liés au réseau électrique et à l’optimisation de ses acteurs. Nous considérons ainsi des instances impliquant des unités d’emmagasinement éner- gétique ou des voitures électriques, jusqu’aux contrôles thermiques des bâtiments intelligents. Nous concluons finalement en introduisant une nouvelle approche hybride qui combine les performances modernes de l’apprentissage profond et de l’apprentissage par renforcement au cadre d’application éprouvé de la recherche opérationnelle classique, dans le but de faciliter l’intégration de nouvelles méthodes d’apprentissage statistique sur différentes applications concrètes.While modern statistical learning is achieving impressive results, as computers start exceeding human baselines in some applications like computer vision, or even beating pro- fessional human players at strategy games without any prior knowledge, reliable deployed applications are still in their infancy compared to what these new opportunities could fathom. In this perspective, with a keen focus on sequential decision theory and recent statistical learning research, we demonstrate efficient application of such methods on instances involving the energy grid and the optimization of its actors, from energy storage and electric cars to smart buildings and thermal controls. We conclude by introducing a new hybrid approach combining the modern performance of deep learning and reinforcement learning with the proven application framework of operations research, in the objective of facilitating seamlessly the integration of new statistical learning-oriented methodologies in concrete applications

    Learning and planning in videogames via task decomposition

    Get PDF
    Artificial intelligence (AI) methods have come a long way in tabletop games, with computer programs having now surpassed human experts in the challenging games of chess, Go and heads-up no-limit Texas hold'em. However, a significant simplifying factor in these games is that individual decisions have a relatively large impact on the state of the game. The real world, however, is granular. Human beings are continually presented with new information and are faced with making a multitude of tiny decisions every second. Viewed in these terms, feedback is often sparse, meaning that it only arrives after one has made a great number of decisions. Moreover, in many real-world problems there is a continuous range of actions to choose from, and attaining meaningful feedback from the environment often requires a strong degree of action coordination. Videogames, in which players must likewise contend with granular time scales and continuous action spaces, are in this sense a better proxy for real-world problems, and have thus become regarded by many as the new frontier in games AI. Seemingly, the way in which human players approach granular decision-making in videogames is by decomposing complex tasks into high-level subproblems, thereby allowing them to focus on the "big picture". For example, in Super Mario World, human players seem to look ahead in extended steps, such as climbing a vine or jumping over a pit, rather than planning one frame at a time. Currently though, this type of reasoning does not come easily to machines, leaving many open research problems related to task decomposition. This thesis focuses on three such problems in particular: (1) The challenge of learning subgoals autonomously, so as to lessen the issue of sparse feedback. (2) The challenge of combining discrete planning techniques with extended actions whose durations and effects on the environment are uncertain. (3) The questions of when and why it is beneficial to reason over high-level continuous control variables, such as the velocity of a player-controlled ship, rather than over the most low-level actions available. We address these problems via new algorithms and novel experimental design, demonstrating empirically that our algorithms are more efficient than strong baselines that do not leverage task decomposition, and yielding insight into the types of environment where task decomposition is likely to be beneficial

    From specialists to generalists : inductive biases of deep learning for higher level cognition

    Full text link
    Les réseaux de neurones actuels obtiennent des résultats de pointe dans une gamme de domaines problématiques difficiles. Avec suffisamment de données et de calculs, les réseaux de neurones actuels peuvent obtenir des résultats de niveau humain sur presque toutes les tâches. En ce sens, nous avons pu former des spécialistes capables d'effectuer très bien une tâche particulière, que ce soit le jeu de Go, jouer à des jeux Atari, manipuler le cube Rubik, mettre des légendes sur des images ou dessiner des images avec des légendes. Le prochain défi pour l'IA est de concevoir des méthodes pour former des généralistes qui, lorsqu'ils sont exposés à plusieurs tâches pendant l'entraînement, peuvent s'adapter rapidement à de nouvelles tâches inconnues. Sans aucune hypothèse sur la distribution génératrice de données, il peut ne pas être possible d'obtenir une meilleure généralisation et une meilleure adaptation à de nouvelles tâches (inconnues). Les réseaux de neurones actuels obtiennent des résultats de pointe dans une gamme de domaines problématiques difficiles. Une possibilité fascinante est que l'intelligence humaine et animale puisse être expliquée par quelques principes, plutôt qu'une encyclopédie de faits. Si tel était le cas, nous pourrions plus facilement à la fois comprendre notre propre intelligence et construire des machines intelligentes. Tout comme en physique, les principes eux-mêmes ne suffiraient pas à prédire le comportement de systèmes complexes comme le cerveau, et des calculs importants pourraient être nécessaires pour simuler l'intelligence humaine. De plus, nous savons que les vrais cerveaux intègrent des connaissances a priori détaillées spécifiques à une tâche qui ne pourraient pas tenir dans une courte liste de principes simples. Nous pensons donc que cette courte liste explique plutôt la capacité des cerveaux à apprendre et à s'adapter efficacement à de nouveaux environnements, ce qui est une grande partie de ce dont nous avons besoin pour l'IA. Si cette hypothèse de simplicité des principes était correcte, cela suggérerait que l'étude du type de biais inductifs (une autre façon de penser aux principes de conception et aux a priori, dans le cas des systèmes d'apprentissage) que les humains et les animaux exploitent pourrait aider à la fois à clarifier ces principes et à fournir source d'inspiration pour la recherche en IA. L'apprentissage en profondeur exploite déjà plusieurs biais inductifs clés, et mon travail envisage une liste plus large, en se concentrant sur ceux qui concernent principalement le traitement cognitif de niveau supérieur. Mon travail se concentre sur la conception de tels modèles en y incorporant des hypothèses fortes mais générales (biais inductifs) qui permettent un raisonnement de haut niveau sur la structure du monde. Ce programme de recherche est à la fois ambitieux et pratique, produisant des algorithmes concrets ainsi qu'une vision cohérente pour une recherche à long terme vers la généralisation dans un monde complexe et changeant.Current neural networks achieve state-of-the-art results across a range of challenging problem domains. Given enough data, and computation, current neural networks can achieve human-level results on mostly any task. In the sense, that we have been able to train \textit{specialists} that can perform a particular task really well whether it's the game of GO, playing Atari games, Rubik's cube manipulation, image caption or drawing images given captions. The next challenge for AI is to devise methods to train \textit{generalists} that when exposed to multiple tasks during training can quickly adapt to new unknown tasks. Without any assumptions about the data generating distribution it may not be possible to achieve better generalization and adaption to new (unknown) tasks. A fascinating possibility is that human and animal intelligence could be explained by a few principles (rather than an encyclopedia). If that was the case, we could more easily both understand our own intelligence and build intelligent machines. Just like in physics, the principles themselves would not be sufficient to predict the behavior of complex systems like brains, and substantial computation might be needed to simulate human intelligence. In addition, we know that real brains incorporate some detailed task-specific a priori knowledge which could not fit in a short list of simple principles. So we think of that short list rather as explaining the ability of brains to learn and adapt efficiently to new environments, which is a great part of what we need for AI. If that simplicity of principles hypothesis was correct it would suggest that studying the kind of inductive biases (another way to think about principles of design and priors, in the case of learning systems) that humans and animals exploit could help both clarify these principles and provide inspiration for AI research. Deep learning already exploits several key inductive biases, and my work considers a larger list, focusing on those which concern mostly higher-level cognitive processing. My work focuses on designing such models by incorporating in them strong but general assumptions (inductive biases) that enable high-level reasoning about the structure of the world. This research program is both ambitious and practical, yielding concrete algorithms as well as a cohesive vision for long-term research towards generalization in a complex and changing world

    Fundamental Approaches to Software Engineering

    Get PDF
    This open access book constitutes the proceedings of the 23rd International Conference on Fundamental Approaches to Software Engineering, FASE 2020, which took place in Dublin, Ireland, in April 2020, and was held as Part of the European Joint Conferences on Theory and Practice of Software, ETAPS 2020. The 23 full papers, 1 tool paper and 6 testing competition papers presented in this volume were carefully reviewed and selected from 81 submissions. The papers cover topics such as requirements engineering, software architectures, specification, software quality, validation, verification of functional and non-functional properties, model-driven development and model transformation, software processes, security and software evolution

    Neural replay in representation, learning and planning

    Get PDF
    Spontaneous neural activity is rarely the subject of investigation in cognitive neuroscience. This may be due to a dominant metaphor of cognition as the information processing unit, whereas internally generated thoughts are often considered as noise. Adopting a reinforcement learning (RL) framework, I consider cognition in terms of an agent trying to attain its internal goals. This framework motivated me to address in my thesis the role of spontaneous neural activity in human cognition. First, I developed a general method, called temporal delayed linear modelling (TDLM), to enable me to analyse this spontaneous activity. TDLM can be thought of as a domain general sequence detection method. It combines nonlinear classification and linear temporal modelling. This enables testing for statistical regularities in sequences of neural representations of a decoded state space. Although developed for use with human non- invasive neuroimaging data, the method can be extended to analyse rodent electrophysiological recordings. Next, I applied TDLM to study spontaneous neural activity during rest in humans. As in rodents, I found that spontaneously generated neural events tended to occur in structured sequences. These sequences are accelerated in time compared to those that related to actual experience (30 -50 ms state-to-state time lag). These sequences, termed replay, reverse their direction after reward receipt. Notably, this human replay is not a recapitulation of prior experience, but follows sequence implied by a learnt abstract structural knowledge, suggesting a factorized representation of structure and sensory information. Finally, I test the role of neural replay in model-based learning and planning in humans. Following reward receipt, I found significant backward replay of non-local experience with a 160 ms lag. This replay prioritises and facilitates the learning of action values. In a separate sequential planning task, I show these neural sequences go forward in direction, depicting the trajectory subjects about to take. The research presented in this thesis reveals a rich role of spontaneous neural activity in supporting internal computations that underpin planning and inference in human cognition

    AI: Limits and Prospects of Artificial Intelligence

    Get PDF
    The emergence of artificial intelligence has triggered enthusiasm and promise of boundless opportunities as much as uncertainty about its limits. The contributions to this volume explore the limits of AI, describe the necessary conditions for its functionality, reveal its attendant technical and social problems, and present some existing and potential solutions. At the same time, the contributors highlight the societal and attending economic hopes and fears, utopias and dystopias that are associated with the current and future development of artificial intelligence
    corecore