1,473 research outputs found

    Hierarchical Reinforcement Learning with the MAXQ Value Function Decomposition

    Full text link
    This paper presents the MAXQ approach to hierarchical reinforcement learning based on decomposing the target Markov decision process (MDP) into a hierarchy of smaller MDPs and decomposing the value function of the target MDP into an additive combination of the value functions of the smaller MDPs. The paper defines the MAXQ hierarchy, proves formal results on its representational power, and establishes five conditions for the safe use of state abstractions. The paper presents an online model-free learning algorithm, MAXQ-Q, and proves that it converges wih probability 1 to a kind of locally-optimal policy known as a recursively optimal policy, even in the presence of the five kinds of state abstraction. The paper evaluates the MAXQ representation and MAXQ-Q through a series of experiments in three domains and shows experimentally that MAXQ-Q (with state abstractions) converges to a recursively optimal policy much faster than flat Q learning. The fact that MAXQ learns a representation of the value function has an important benefit: it makes it possible to compute and execute an improved, non-hierarchical policy via a procedure similar to the policy improvement step of policy iteration. The paper demonstrates the effectiveness of this non-hierarchical execution experimentally. Finally, the paper concludes with a comparison to related work and a discussion of the design tradeoffs in hierarchical reinforcement learning.Comment: 63 pages, 15 figure

    Learning the Structure of Continuous Markov Decision Processes

    Get PDF
    There is growing interest in artificial, intelligent agents which can operate autonomously for an extended period of time in complex environments and fulfill a variety of different tasks. Such agents will face different problems during their lifetime which may not be foreseeable at the time of their deployment. Thus, the capacity for lifelong learning of new behaviors is an essential prerequisite for this kind of agents as it enables them to deal with unforeseen situations. However, learning every complex behavior anew from scratch would be cumbersome for the agent. It is more plausible to consider behavior to be modular and let the agent acquire a set of reusable building blocks for behavior, the so-called skills. These skills might, once acquired, facilitate fast learning and adaptation of behavior to new situations. This work focuses on computational approaches for skill acquisition, namely which kind of skills shall be acquired and how to acquire them. The former is commonly denoted as skill discovery and the latter as skill learning . The main contribution of this thesis is a novel incremental skill acquisition approach which is suited for lifelong learning. In this approach, the agent learns incrementally a graph-based representation of a domain and exploits certain properties of this graph such as its bottlenecks for skill discovery. This thesis proposes a novel approach for learning a graph-based representation of continuous domains based on formalizing the problem as a probabilistic generative model. Furthermore, a new incremental agglomerative clustering approach for identifying bottlenecks of such graphs is presented. Thereupon, the thesis proposes a novel intrinsic motivation system which enables an agent to intelligently allocate time between skill discovery and skill learning in developmental settings, where the agent is not constrained by external tasks. The results of this thesis show that the resulting skill acquisition approach is suited for continuous domains and can deal with domain stochasticity and different explorative behavior of the agent. The acquired skills are reusable and versatile and can be used in multi-task and lifelong learning settings in high-dimensional problems

    Options of Interest: Temporal Abstraction with Interest Functions

    Full text link
    Temporal abstraction refers to the ability of an agent to use behaviours of controllers which act for a limited, variable amount of time. The options framework describes such behaviours as consisting of a subset of states in which they can initiate, an internal policy and a stochastic termination condition. However, much of the subsequent work on option discovery has ignored the initiation set, because of difficulty in learning it from data. We provide a generalization of initiation sets suitable for general function approximation, by defining an interest function associated with an option. We derive a gradient-based learning algorithm for interest functions, leading to a new interest-option-critic architecture. We investigate how interest functions can be leveraged to learn interpretable and reusable temporal abstractions. We demonstrate the efficacy of the proposed approach through quantitative and qualitative results, in both discrete and continuous environments.Comment: To appear in Proceedings of the Thirty-Fourth AAAI Conference on Artificial Intelligence (AAAI-20

    A social networking-enabled framework for autonomous robot skill development

    Full text link
    University of Technology Sydney. Faculty of Engineering and Information Technology.Intelligent service robots will need to adapt to unforeseen situations when performing tasks for humans. To do this, they will be expected to continuously develop new skills. Existing frameworks that address robot learning of new skills for a particular task often follow a task-specific design approach. Task-specific design is unable to support robots to adapt new skills to new tasks. This is largely due to the inability of skill specification in task-specific design to be extended or to be easily changed. This dissertation provides an innovative task-independent framework that allows robots to develop new skills on their own. The idea is to create an online social network platform called Numbots that enables robots to learn new skills autonomously from their social circles. This platform integrates a state-of-the-art approach to learning from experience, called Constructing Skill Trees (CST), with a state-of-the-art framework for knowledge sharing, called RoboEarth. Based on this integration, a new logic model for online Robot-Robot Interaction (RRI) is developed. The principal focus of this dissertation is the analysis of, and solutions to three underlying technical challenges required to achieve the RRI model: (i) skill representation; (ii) autonomous skill recognition and sharing; and (iii) skill transfer. We focus on motion skills required to interact with and manipulate objects where a robot performs a series of motions to attain a goal given by humans. Skills formalise robot activities, which may involve an object (for example, kicking a ball, lifting a box, or passing a bottle of water to a person). Skills may also include robot activities that do not involve objects (for example, raising hands or walking forward). The first challenge concerns how to create a new skill representation that can represent robot skills independently of robot species, tasks and environments. We develop a generic robot skill representation, which characterises three key dimensions of a robot skill in the focused domain: the changing relationship, the spatial relationship and the temporal relationship between the robot and a possible object. The new representation takes a spatial-temporal perspective similar to that found in RoboEarth, and uses the concepts of “agent space” and “object space” from the CST approach. The second challenge concerns how to enable robots to autonomously recognise and share their experiences with other robots that are in their social network. We propose an effect-based skill recognition mechanism that enables robots to recognise skills based on the effects that result from their action. We introduce two types of autonomous skill recognition: (i) recognition of a chain of existing skill primitives; (ii) recognition of a chain of unknown skills. All recognised skills are generalised and packed into a JSON file to share across Numbots. The third challenge is how to enable shared generic robot skills to be interpreted by a robot learner for its own problem solving. We introduce an effect-based skill transfer mechanism, an algorithm to decompose and customise the downloaded generic robot skill into a set of executable action commands for the robot learner's own problem solving. After the introduction of three technical challenges of the RRI model and our solutions, a simulation is undertaken. It demonstrates that a skill recognised and shared by a PR2 robot can be reused and transferred by a NAO robot for a different problem solving. In addition, we also provide a series of comparisons with RoboEarth with a use case study “ServeADrink” to demonstrate the key advantages of the newly created generic robot skill representation over the limited skill representation in RoboEarth. Even though implementation of Numbots and the RRI model on a real robot remains as future work, the proposed analysis and solutions in this dissertation have demonstrated the potential to enable robots to develop new skills on their own, in the absence of human/robot demonstrators and to perform a task for which the robot was not explicitly programmed
    corecore