94 research outputs found

    Subgoal Identifications in Reinforcement Learning: A Survey

    Get PDF

    Learning and planning in videogames via task decomposition

    Get PDF
    Artificial intelligence (AI) methods have come a long way in tabletop games, with computer programs having now surpassed human experts in the challenging games of chess, Go and heads-up no-limit Texas hold'em. However, a significant simplifying factor in these games is that individual decisions have a relatively large impact on the state of the game. The real world, however, is granular. Human beings are continually presented with new information and are faced with making a multitude of tiny decisions every second. Viewed in these terms, feedback is often sparse, meaning that it only arrives after one has made a great number of decisions. Moreover, in many real-world problems there is a continuous range of actions to choose from, and attaining meaningful feedback from the environment often requires a strong degree of action coordination. Videogames, in which players must likewise contend with granular time scales and continuous action spaces, are in this sense a better proxy for real-world problems, and have thus become regarded by many as the new frontier in games AI. Seemingly, the way in which human players approach granular decision-making in videogames is by decomposing complex tasks into high-level subproblems, thereby allowing them to focus on the "big picture". For example, in Super Mario World, human players seem to look ahead in extended steps, such as climbing a vine or jumping over a pit, rather than planning one frame at a time. Currently though, this type of reasoning does not come easily to machines, leaving many open research problems related to task decomposition. This thesis focuses on three such problems in particular: (1) The challenge of learning subgoals autonomously, so as to lessen the issue of sparse feedback. (2) The challenge of combining discrete planning techniques with extended actions whose durations and effects on the environment are uncertain. (3) The questions of when and why it is beneficial to reason over high-level continuous control variables, such as the velocity of a player-controlled ship, rather than over the most low-level actions available. We address these problems via new algorithms and novel experimental design, demonstrating empirically that our algorithms are more efficient than strong baselines that do not leverage task decomposition, and yielding insight into the types of environment where task decomposition is likely to be beneficial

    Agent-Driven Representations, Algorithms, and Metrics for Automated Organizational Design.

    Full text link
    As cooperative multiagent systems (MASs) increase in interconnectivity, complexity, size, and longevity, coordinating the agents' reasoning and behaviors becomes increasingly difficult. One approach to address these issues is to use insights from human organizations to design structures within which the agents can more efficiently reason and interact. Generally speaking, an organization influences each agent such that, by following its respective influences, an agent can make globally-useful local decisions without having to explicitly reason about the complete joint coordination problem. For example, an organizational influence might constrain and/or inform which actions an agent performs. If these influences are well-constructed to be cohesive and correlated across the agents, then each agent is influenced into reasoning about and performing only the actions that are appropriate for its (organizationally-designated) portion of the joint coordination problem. In this dissertation, I develop an agent-driven approach to organizations, wherein the foundation for representing and reasoning about an organization stems from the needs of the agents in the MAS. I create an organizational specification language to express the possible ways in which an organization could influence the agents' decision making processes, and leverage details from those decision processes to establish quantitative, principled metrics for organizational performance based on the expected impact that an organization will have on the agents' reasoning and behaviors. Building upon my agent-driven organizational representations, I identify a strategy for automating the organizational design process~(ODP), wherein my ODP computes a quantitative description of organizational patterns and then searches through those possible patterns to identify an (approximately) optimal set of organizational influences for the MAS. Evaluating my ODP reveals that it can create organizations that both influence the MAS into effective patterns of joint policies and also streamline the agents' decision making in a coordinate manner. Finally, I use my agent-driven approach to identify characteristics of effective abstractions over organizational influences and a heuristic strategy for converging on a good abstraction.PhDComputer Science and EngineeringUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttp://deepblue.lib.umich.edu/bitstream/2027.42/113616/1/jsleight_1.pd

    Learning Models of Behavior From Demonstration and Through Interaction

    Get PDF
    This dissertation is concerned with the autonomous learning of behavioral models for sequential decision-making. It addresses both the theoretical aspects of behavioral modeling — like the learning of appropriate task representations — and the practical difficulties regarding algorithmic implementation. The first half of the dissertation deals with the problem of learning from demonstration, which consists in generalizing the behavior of an expert demonstrator based on observation data. Two alternative modeling paradigms are discussed. First, a nonparametric inference framework is developed to capture the behavior of the expert at the policy level. A key challenge in the design of the framework is the objective of making minimal assumptions about the observed behavior type while dealing with a potentially infinite number of system states. Due to the automatic adaptation of the model order to the complexity of the shown behavior, the proposed approach is able to pick up stochastic expert policies of arbitrary structure. Second, a nonparametric inverse reinforcement learning framework based on subgoal modeling is proposed, which allows to efficiently reconstruct the expert behavior at the intentional level. Other than most existing approaches, the proposed methodology naturally handles periodic tasks and situations where the intentions of the expert change over time. By adaptively decomposing the decision-making problem into a series of task-related subproblems, both inference frameworks are suitable for learning compact encodings of the expert behavior. For performance evaluation, the models are compared with existing frameworks on synthetic benchmark scenarios and real-world data recorded on a KUKA lightweight robotic arm. In the second half of the work, the focus shifts to multi-agent modeling, with the aim of analyzing the decision-making process in large-scale homogeneous agent networks. To fill the gap of decentralized system models with explicit agent homogeneity, a new class of agent systems is introduced. For this system class, the problem of inverse reinforcement learning is discussed and a meta learning algorithm is devised that makes explicit use of the system symmetries. As part of the algorithm, a heterogeneous reinforcement learning scheme is proposed for optimizing the collective behavior of the system based on the local state observations made at the agent level. Finally, to scale the simulation of the network to large agent numbers, a continuum version of the model is derived. After discussing the system components and associated optimality criteria, numerical examples of collective tasks are given that demonstrate the capabilities of the continuum approach and show its advantages over large-scale agent-based modeling

    Agent programming in the cognitive era

    Get PDF
    It is claimed that, in the nascent ‘Cognitive Era’, intelligent systems will be trained using machine learning techniques rather than programmed by software developers. A contrary point of view argues that machine learning has limitations, and, taken in isolation, cannot form the basis of autonomous systems capable of intelligent behaviour in complex environments. In this paper, we explore the contributions that agent-oriented programming can make to the development of future intelligent systems. We briefly review the state of the art in agent programming, focussing particularly on BDI-based agent programming languages, and discuss previous work on integrating AI techniques (including machine learning) in agent-oriented programming. We argue that the unique strengths of BDI agent languages provide an ideal framework for integrating the wide range of AI capabilities necessary for progress towards the next-generation of intelligent systems. We identify a range of possible approaches to integrating AI into a BDI agent architecture. Some of these approaches, e.g., ‘AI as a service’, exploit immediate synergies between rapidly maturing AI techniques and agent programming, while others, e.g., ‘AI embedded into agents’ raise more fundamental research questions, and we sketch a programme of research directed towards identifying the most appropriate ways of integrating AI capabilities into agent programs

    Bayesian nonparametric reward learning from demonstration

    Get PDF
    Thesis: Ph. D., Massachusetts Institute of Technology, Department of Aeronautics and Astronautics, 2013.Cataloged from PDF version of thesis.Includes bibliographical references (pages 123-132).Learning from demonstration provides an attractive solution to the problem of teaching autonomous systems how to perform complex tasks. Demonstration opens autonomy development to non-experts and is an intuitive means of communication for humans, who naturally use demonstration to teach others. This thesis focuses on a specific form of learning from demonstration, namely inverse reinforcement learning, whereby the reward of the demonstrator is inferred. Formally, inverse reinforcement learning (IRL) is the task of learning the reward function of a Markov Decision Process (MDP) given knowledge of the transition function and a set of observed demonstrations. While reward learning is a promising method of inferring a rich and transferable representation of the demonstrator's intents, current algorithms suffer from intractability and inefficiency in large, real-world domains. This thesis presents a reward learning framework that infers multiple reward functions from a single, unsegmented demonstration, provides several key approximations which enable scalability to large real-world domains, and generalizes to fully continuous demonstration domains without the need for discretization of the state space, all of which are not handled by previous methods. In the thesis, modifications are proposed to an existing Bayesian IRL algorithm to improve its efficiency and tractability in situations where the state space is large and the demonstrations span only a small portion of it. A modified algorithm is presented and simulation results show substantially faster convergence while maintaining the solution quality of the original method. Even with the proposed efficiency improvements, a key limitation of Bayesian IRL (and most current IRL methods) is the assumption that the demonstrator is maximizing a single reward function. This presents problems when dealing with unsegmented demonstrations containing multiple distinct tasks, common in robot learning from demonstration (e.g. in large tasks that may require multiple subtasks to complete). A key contribution of this thesis is the development of a method that learns multiple reward functions from a single demonstration. The proposed method, termed Bayesian nonparametric inverse reinforcement learning (BNIRL), uses a Bayesian nonparametric mixture model to automatically partition the data and find a set of simple reward functions corresponding to each partition. The simple rewards are interpreted intuitively as subgoals, which can be used to predict actions or analyze which states are important to the demonstrator. Simulation results demonstrate the ability of BNIRL to handle cyclic tasks that break existing algorithms due to the existence of multiple subgoal rewards in the demonstration. The BNIRL algorithm is easily parallelized, and several approximations to the demonstrator likelihood function are offered to further improve computational tractability in large domains. Since BNIRL is only applicable to discrete domains, the Bayesian nonparametric reward learning framework is extended to general continuous demonstration domains using Gaussian process reward representations. The resulting algorithm, termed Gaussian process subgoal reward learning (GPSRL), is the only learning from demonstration method that is able to learn multiple reward functions from unsegmented demonstration in general continuous domains. GPSRL does not require discretization of the continuous state space and focuses computation efficiently around the demonstration itself. Learned subgoal rewards are cast as Markov decision process options to enable execution of the learned behaviors by the robotic system and provide a principled basis for future learning and skill refinement. Experiments conducted in the MIT RAVEN indoor test facility demonstrate the ability of both BNIRL and GPSRL to learn challenging maneuvers from demonstration on a quadrotor helicopter and a remote-controlled car.by Bernard J. Michini.Ph. D

    Hierarchical Reinforcement Learning in Behavior and the Brain

    Get PDF
    Dissertation presented to obtain the Ph.D degree in Biology, NeuroscienceReinforcement learning (RL) has provided key insights to the neurobiology of learning and decision making. The pivotal nding is that the phasic activity of dopaminergic cells in the ventral tegmental area during learning conforms to a reward prediction error (RPE), as speci ed in the temporal-di erence learning algorithm (TD). This has provided insights to conditioning, the distinction between habitual and goal-directed behavior, working memory, cognitive control and error monitoring. It has also advanced the understanding of cognitive de cits in Parkinson's disease, depression, ADHD and of personality traits such as impulsivity.(...

    Learning the Structure of Continuous Markov Decision Processes

    Get PDF
    There is growing interest in artificial, intelligent agents which can operate autonomously for an extended period of time in complex environments and fulfill a variety of different tasks. Such agents will face different problems during their lifetime which may not be foreseeable at the time of their deployment. Thus, the capacity for lifelong learning of new behaviors is an essential prerequisite for this kind of agents as it enables them to deal with unforeseen situations. However, learning every complex behavior anew from scratch would be cumbersome for the agent. It is more plausible to consider behavior to be modular and let the agent acquire a set of reusable building blocks for behavior, the so-called skills. These skills might, once acquired, facilitate fast learning and adaptation of behavior to new situations. This work focuses on computational approaches for skill acquisition, namely which kind of skills shall be acquired and how to acquire them. The former is commonly denoted as skill discovery and the latter as skill learning . The main contribution of this thesis is a novel incremental skill acquisition approach which is suited for lifelong learning. In this approach, the agent learns incrementally a graph-based representation of a domain and exploits certain properties of this graph such as its bottlenecks for skill discovery. This thesis proposes a novel approach for learning a graph-based representation of continuous domains based on formalizing the problem as a probabilistic generative model. Furthermore, a new incremental agglomerative clustering approach for identifying bottlenecks of such graphs is presented. Thereupon, the thesis proposes a novel intrinsic motivation system which enables an agent to intelligently allocate time between skill discovery and skill learning in developmental settings, where the agent is not constrained by external tasks. The results of this thesis show that the resulting skill acquisition approach is suited for continuous domains and can deal with domain stochasticity and different explorative behavior of the agent. The acquired skills are reusable and versatile and can be used in multi-task and lifelong learning settings in high-dimensional problems
    • …
    corecore