1,993 research outputs found

    Structures for Sophisticated Behaviour: Feudal Hierarchies and World Models

    Get PDF
    This thesis explores structured, reward-based behaviour in artificial agents and in animals. In Part I we investigate how reinforcement learning agents can learn to cooperate. Drawing inspiration from the hierarchical organisation of human societies, we propose the framework of Feudal Multi-agent Hierarchies (FMH), in which coordination of many agents is facilitated by a manager agent. We outline the structure of FMH and demonstrate its potential for decentralised learning and control. We show that, given an adequate set of subgoals from which to choose, FMH performs, and particularly scales, substantially better than cooperative approaches that use shared rewards. We next investigate training FMH in simulation to solve a complex information gathering task. Our approach introduces a ‘Centralised Policy Actor-Critic’ (CPAC) and an alteration to the conventional multi-agent policy gradient, which allows one multi-agent system to advise the training of another. We further exploit this idea for communicating agents with shared rewards and demonstrate its efficacy. In Part II we examine how animals discover and exploit underlying statistical structure in their environments, even when such structure is difficult to learn and use. By analysing behavioural data from an extended experiment with rats, we show that such hidden structure can indeed be learned, but also that subjects suffer from imperfections in their ability to infer their current state. We account for their behaviour using a Hidden Markov Model, in which recent observations are integrated imperfectly with evidence from the past. We find that over the course of training, subjects learn to track their progress through the task more accurately, a change that our model largely attributes to the more reliable integration of past evidenc

    Bayesian Maps: probabilistic and hierarchical models for mobile robot navigation

    Get PDF
    What is a map? What is its utility? What is a location, a behaviour? What are navigation, localization and prediction for a mobile robot facing a given task ? These questions have neither unique nor straightforward answer to this day, and are still the core of numerous research domains. Robotics, for instance, aim at answering them for creating successful sensori-motor artefacts. Cognitive sciences use these questions as intermediate goals on the road to un- derstanding living beings, their skills, and furthermore, their intelligence. Our study lies between these two domains. We first study classical probabilistic ap- proaches (Markov localization, POMDPs, HMMs, etc.), then some biomimetic approaches (Berthoz, Franz, Kuipers). We analyze their respective advantages and drawbacks in light of a general formalism for robot programming based on bayesian inference (BRP). We propose a new probabilistic formalism for modelling the interaction between a robot and its environment : the Bayesian map. In this framework, defining a map is done by specifying a particular probability distri- bution. Some of the questions above then amount to solving inference problems. We define operators for putting maps together, so that " hierarchies of maps " and incremental development play a central role in our formalism, as in biomimetic approaches. By using the bayesian formalism, we also benefit both from a unified means of dealing with uncertainties, and from clear and rigorous mathematical foundations. Our formalism is illustrated by experiments that have been implemented on a Koala mobile robot

    Towards Continual Reinforcement Learning: A Review and Perspectives

    Full text link
    In this article, we aim to provide a literature review of different formulations and approaches to continual reinforcement learning (RL), also known as lifelong or non-stationary RL. We begin by discussing our perspective on why RL is a natural fit for studying continual learning. We then provide a taxonomy of different continual RL formulations and mathematically characterize the non-stationary dynamics of each setting. We go on to discuss evaluation of continual RL agents, providing an overview of benchmarks used in the literature and important metrics for understanding agent performance. Finally, we highlight open problems and challenges in bridging the gap between the current state of continual RL and findings in neuroscience. While still in its early days, the study of continual RL has the promise to develop better incremental reinforcement learners that can function in increasingly realistic applications where non-stationarity plays a vital role. These include applications such as those in the fields of healthcare, education, logistics, and robotics.Comment: Preprint, 52 pages, 8 figure

    Network Security Using Self Organized Multi-Agent Swarms

    Get PDF
    Computer network cyber-security is a very serious concern in many commercial, industrial, and military environments. This paper proposes a new computer network security approach defined by self organized agent swarms (SOMAS) which provides a novel computer network security management framework based upon desired overall system behaviors. The SOMAS structure evolves based upon the partially observable Markov decision process (POMDP) formal model and the more complex interactive-POMDP and decentralized-POMDP models. Example swarm specific and network based behaviors are formalized and simulated. This paper illustrates through various statistical testing techniques, the significance of this proposed SOMAS architecture

    Self Organized Multi Agent Swarms (SOMAS) for Network Security Control

    Get PDF
    Computer network security is a very serious concern in many commercial, industrial, and military environments. This paper proposes a new computer network security approach defined by self-organized agent swarms (SOMAS) which provides a novel computer network security management framework based upon desired overall system behaviors. The SOMAS structure evolves based upon the partially observable Markov decision process (POMDP) formal model and the more complex Interactive-POMDP and Decentralized-POMDP models, which are augmented with a new F(*-POMDP) model. Example swarm specific and network based behaviors are formalized and simulated. This paper illustrates through various statistical testing techniques, the significance of this proposed SOMAS architecture, and the effectiveness of self-organization and entangled hierarchies

    Deep Reinforcement Learning and sub-problem decomposition using Hierarchical Architectures in partially observable environments

    Get PDF
    Reinforcement Learning (RL) is based on the Markov Decision Process (MDP) framework, but not all the problems of interest can be modeled with MDPs because some of them have non-markovian temporal dependencies. To handle them, one of the solutions proposed in literature is Hierarchical Reinforcement Learning (HRL). HRL takes inspiration from hierarchical planning in artificial intelligence literature and it is an emerging sub-discipline for RL, in which RL methods are augmented with some kind of prior knowledge about the high-level structure of behavior in order to decompose the underlying problem into simpler sub-problems. The high-level goal of our thesis is to investigate the advantages that a HRL approach may have over a simple RL approach. Thus, we study problems of interest (rarely tackled by mean of RL) like Sentiment Analysis, Rogue and Car Controller, showing how the ability of RL algorithms to solve them in a partially observable environment is affected by using (or not) generic hierarchical architectures based on RL algorithms of the Actor-Critic family. Remarkably, we claim that especially our work in Sentiment Analysis is very innovative for RL, resulting in state-of-the-art performances; as far as the author knows, Reinforcement Learning approach is only rarely applied to the domain of computational linguistic and sentiment analysis. Furthermore, our work on the famous video-game Rogue is probably the first example of Deep RL architecture able to explore Rogue dungeons and fight against its monsters achieving a success rate of more than 75% on the first game level. While our work on Car Controller allowed us to make some interesting considerations on the nature of some components of the policy gradient equation

    SCALING REINFORCEMENT LEARNING THROUGH FEUDAL MULTI-AGENT HIERARCHY

    Get PDF
    Militaries conduct wargames for training, planning, and research purposes. Artificial intelligence (AI) can improve military wargaming by reducing costs, speeding up the decision-making process, and offering new insights. Previous researchers explored using reinforcement learning (RL) for wargaming based on the successful use of RL for other human competitive games. While previous research has demonstrated that an RL agent can generate combat behavior, those experiments have been limited to small-scale wargames. This thesis investigates the feasibility and acceptability of -scaling hierarchical reinforcement learning (HRL) to support integrating AI into large military wargames. Additionally, this thesis also investigates potential complications that arise when replacing the opposing force with an intelligent agent by exploring the ways in which an intelligent agent can cause a wargame to fail. The resources required to train a feudal multi-agent hierarchy (FMH) and a standard RL agent and their effectiveness are compared in increasingly complicated wargames. While FMH fails to demonstrate the performance required for large wargames, it offers insight for future HRL research. Finally, the Department of Defense verification, validation, and accreditation process is proposed as a method to ensure that any future AI application applied to wargames are suitable.Lieutenant Colonel, United States ArmyApproved for public release. Distribution is unlimited
    • …
    corecore