170,041 research outputs found

    Combining Subgoal Graphs with Reinforcement Learning to Build a Rational Pathfinder

    Full text link
    In this paper, we present a hierarchical path planning framework called SG-RL (subgoal graphs-reinforcement learning), to plan rational paths for agents maneuvering in continuous and uncertain environments. By "rational", we mean (1) efficient path planning to eliminate first-move lags; (2) collision-free and smooth for agents with kinematic constraints satisfied. SG-RL works in a two-level manner. At the first level, SG-RL uses a geometric path-planning method, i.e., Simple Subgoal Graphs (SSG), to efficiently find optimal abstract paths, also called subgoal sequences. At the second level, SG-RL uses an RL method, i.e., Least-Squares Policy Iteration (LSPI), to learn near-optimal motion-planning policies which can generate kinematically feasible and collision-free trajectories between adjacent subgoals. The first advantage of the proposed method is that SSG can solve the limitations of sparse reward and local minima trap for RL agents; thus, LSPI can be used to generate paths in complex environments. The second advantage is that, when the environment changes slightly (i.e., unexpected obstacles appearing), SG-RL does not need to reconstruct subgoal graphs and replan subgoal sequences using SSG, since LSPI can deal with uncertainties by exploiting its generalization ability to handle changes in environments. Simulation experiments in representative scenarios demonstrate that, compared with existing methods, SG-RL can work well on large-scale maps with relatively low action-switching frequencies and shorter path lengths, and SG-RL can deal with small changes in environments. We further demonstrate that the design of reward functions and the types of training environments are important factors for learning feasible policies.Comment: 20 page

    Coordinating decentralized learning and conflict resolution across agent boundaries

    Get PDF
    It is crucial for embedded systems to adapt to the dynamics of open environments. This adaptation process becomes especially challenging in the context of multiagent systems because of scalability, partial information accessibility and complex interaction of agents. It is a challenge for agents to learn good policies, when they need to plan and coordinate in uncertain, dynamic environments, especially when they have large state spaces. It is also critical for agents operating in a multiagent system (MAS) to resolve conflicts among the learned policies of different agents, since such conflicts may have detrimental influence on the overall performance. The focus of this research is to use a reinforcement learning based local optimization algorithm within each agent to learn multiagent policies in a decentralized fashion. These policies will allow each agent to adapt to changes in environmental conditions while reorganizing the underlying multiagent network when needed. The research takes an adaptive approach to resolving conflicts that can arise between locally optimal agent policies. First an algorithm that uses heuristic rules to locally resolve simple conflicts is presented. When the environment is more dynamic and uncertain, a mediator-based mechanism to resolve more complicated conflicts and selectively expand the agents' state space during the learning process is harnessed. For scenarios where mediator-based mechanisms with partially global views are ineffective, a more rigorous approach for global conflict resolution that synthesizes multiagent reinforcement learning (MARL) and distributed constraint optimization (DCOP) is developed. These mechanisms are evaluated in the context of a multiagent tornado tracking application called NetRads. Empirical results show that these mechanisms significantly improve the performance of the tornado tracking network for a variety of weather scenarios. The major contributions of this work are: a state of the art decentralized learning approach that supports agent interactions and reorganizes the underlying network when needed; the use of abstract classes of scenarios/states/actions that efficiently manages the exploration of the search space; novel conflict resolution algorithms of increasing complexity that use heuristic rules, sophisticated automated negotiation mechanisms and distributed constraint optimization methods respectively; and finally, a rigorous study of the interplay between two popular theories used to solve multiagent problems, namely decentralized Markov decision processes and distributed constraint optimization

    Rational macroeconomic learning in linear expectational models

    Get PDF
    Abstract: The partial information rational expectations solution to a general linear multivariate expectational macro-model is found when agents are uncertain about the true values of the model’s parameters. Necessary and sufficient conditions for convergence to the full information rational expectations solution are given, and the core of an algorithm for the Bayesian updating of beliefs is provided. In the course of this a new class of full information rational expectations equilibria is described and some of its desirable properties proven.Rational Expectations; Partial information; Bayesian learning; Generalized Schur decomposition; Sunspots; Indeterminacy; Feasible Rational Expectations Equilibria

    Universal Reinforcement Learning Algorithms: Survey and Experiments

    Full text link
    Many state-of-the-art reinforcement learning (RL) algorithms typically assume that the environment is an ergodic Markov Decision Process (MDP). In contrast, the field of universal reinforcement learning (URL) is concerned with algorithms that make as few assumptions as possible about the environment. The universal Bayesian agent AIXI and a family of related URL algorithms have been developed in this setting. While numerous theoretical optimality results have been proven for these agents, there has been no empirical investigation of their behavior to date. We present a short and accessible survey of these URL algorithms under a unified notation and framework, along with results of some experiments that qualitatively illustrate some properties of the resulting policies, and their relative performance on partially-observable gridworld environments. We also present an open-source reference implementation of the algorithms which we hope will facilitate further understanding of, and experimentation with, these ideas.Comment: 8 pages, 6 figures, Twenty-sixth International Joint Conference on Artificial Intelligence (IJCAI-17

    Investigating social interaction strategies for bootstrapping lexicon development

    Get PDF
    This paper investigates how different modes of social interactions influence the bootstrapping and evolution of lexicons. This is done by comparing three language game models that differ in the type of social interactions they use. The simulations show that the language games which use either joint attention or corrective feedback as a source of contextual input are better capable of bootstrapping a lexicon than the game without such directed interactions. The simulation of the latter game, however, does show that it is possible to develop a lexicon without using directed input when the lexicon is transmitted from generation to generation

    Intelligent Association Exploration and Exploitation of Fuzzy Agents in Ambient Intelligent Environments

    Get PDF
    This paper presents a novel fuzzy-based intelligent architecture that aims to find relevant and important associations between embedded-agent based services that form Ambient Intelligent Environments (AIEs). The embedded agents are used in two ways; first they monitor the inhabitants of the AIE, learning their behaviours in an online, non-intrusive and life-long fashion with the aim of pre-emptively setting the environment to the users preferred state. Secondly, they evaluate the relevance and significance of the associations to various services with the aim of eliminating redundant associations in order to minimize the agent computational latency within the AIE. The embedded agents employ fuzzy-logic due to its robustness to the uncertainties, noise and imprecision encountered in AIEs. We describe unique real world experiments that were conducted in the Essex intelligent Dormitory (iDorm) to evaluate and validate the significance of the proposed architecture and methods
    • …
    corecore