188 research outputs found

    Accelerating Reinforcement Learning by Composing Solutions of Automatically Identified Subtasks

    Full text link
    This paper discusses a system that accelerates reinforcement learning by using transfer from related tasks. Without such transfer, even if two tasks are very similar at some abstract level, an extensive re-learning effort is required. The system achieves much of its power by transferring parts of previously learned solutions rather than a single complete solution. The system exploits strong features in the multi-dimensional function produced by reinforcement learning in solving a particular task. These features are stable and easy to recognize early in the learning process. They generate a partitioning of the state space and thus the function. The partition is represented as a graph. This is used to index and compose functions stored in a case base to form a close approximation to the solution of the new task. Experiments demonstrate that function composition often produces more than an order of magnitude increase in learning rate compared to a basic reinforcement learning algorithm

    Hierarchical Reinforcement Learning with the MAXQ Value Function Decomposition

    Full text link
    This paper presents the MAXQ approach to hierarchical reinforcement learning based on decomposing the target Markov decision process (MDP) into a hierarchy of smaller MDPs and decomposing the value function of the target MDP into an additive combination of the value functions of the smaller MDPs. The paper defines the MAXQ hierarchy, proves formal results on its representational power, and establishes five conditions for the safe use of state abstractions. The paper presents an online model-free learning algorithm, MAXQ-Q, and proves that it converges wih probability 1 to a kind of locally-optimal policy known as a recursively optimal policy, even in the presence of the five kinds of state abstraction. The paper evaluates the MAXQ representation and MAXQ-Q through a series of experiments in three domains and shows experimentally that MAXQ-Q (with state abstractions) converges to a recursively optimal policy much faster than flat Q learning. The fact that MAXQ learns a representation of the value function has an important benefit: it makes it possible to compute and execute an improved, non-hierarchical policy via a procedure similar to the policy improvement step of policy iteration. The paper demonstrates the effectiveness of this non-hierarchical execution experimentally. Finally, the paper concludes with a comparison to related work and a discussion of the design tradeoffs in hierarchical reinforcement learning.Comment: 63 pages, 15 figure

    Augmented Language Models: a Survey

    Full text link
    This survey reviews works in which language models (LMs) are augmented with reasoning skills and the ability to use tools. The former is defined as decomposing a potentially complex task into simpler subtasks while the latter consists in calling external modules such as a code interpreter. LMs can leverage these augmentations separately or in combination via heuristics, or learn to do so from demonstrations. While adhering to a standard missing tokens prediction objective, such augmented LMs can use various, possibly non-parametric external modules to expand their context processing ability, thus departing from the pure language modeling paradigm. We therefore refer to them as Augmented Language Models (ALMs). The missing token objective allows ALMs to learn to reason, use tools, and even act, while still performing standard natural language tasks and even outperforming most regular LMs on several benchmarks. In this work, after reviewing current advance in ALMs, we conclude that this new research direction has the potential to address common limitations of traditional LMs such as interpretability, consistency, and scalability issues

    Indirect Methods for Robot Skill Learning

    Get PDF
    Robot learning algorithms are appealing alternatives for acquiring rational robotic behaviors from data collected during the execution of tasks. Furthermore, most robot learning techniques are stated as isolated stages and focused on directly obtaining rational policies as a result of optimizing only performance measures of single tasks. However, formulating robotic skill acquisition processes in such a way have some disadvantages. For example, if the same skill has to be learned by different robots, independent learning processes should be carried out for acquiring exclusive policies for each robot. Similarly, if a robot has to learn diverse skills, the robot should acquire the policy for each task in separate learning processes, in a sequential order and commonly starting from scratch. In the same way, formulating the learning process in terms of only the performance measure, makes robots to unintentionally avoid situations that should not be repeated, but without any mechanism that captures the necessity of not repeating those wrong behaviors. In contrast, humans and other animals exploit their experience not only for improving the performance of the task they are currently executing, but for constructing indirectly multiple models to help them with that particular task and to generalize to new problems. Accordingly, the models and algorithms proposed in this thesis seek to be more data efficient and extract more information from the interaction data that is collected either from expert\u2019s demonstrations or the robot\u2019s own experience. The first approach encodes robotic skills with shared latent variable models, obtaining latent representations that can be transferred from one robot to others, therefore avoiding to learn the same task from scratch. The second approach learns complex rational policies by representing them as hierarchical models that can perform multiple concurrent tasks, and whose components are learned in the same learning process, instead of separate processes. Finally, the third approach uses the interaction data for learning two alternative and antagonistic policies that capture what to and not to do, and which influence the learning process in addition to the performance measure defined for the task

    Lifelong Machine Learning Of Functionally Compositional Structures

    Get PDF
    A hallmark of human intelligence is the ability to construct self-contained chunks of knowledge and reuse them in novel combinations for solving different yet structurally related problems. Learning such compositional structures has been a significant challenge for artificial systems, due to the underlying combinatorial search. To date, research into compositional learning has largely proceeded separately from work on lifelong or continual learning. This dissertation integrated these two lines of work to present a general-purpose framework for lifelong learning of functionally compositional structures. The framework separates the learning into two stages: learning how to best combine existing components to assimilate a novel problem, and learning how to adapt the set of existing components to accommodate the new problem. This separation explicitly handles the trade-off between the stability required to remember how to solve earlier tasks and the flexibility required to solve new tasks. This dissertation instantiated the framework into various supervised and reinforcement learning (RL) algorithms. Empirical evaluations on a range of supervised learning benchmarks compared the proposed algorithms against well-established techniques, and found that 1)~compositional models enable improved lifelong learning when the tasks are highly diverse by balancing the incorporation of new knowledge and the retention of past knowledge, 2)~the separation of the learning into stages permits lifelong learning of compositional knowledge, and 3)~the components learned by the proposed methods represent self-contained and reusable functions. Similar evaluations on existing and new RL benchmarks demonstrated that 1)~algorithms under the framework accelerate the discovery of high-performing policies in a variety of domains, including robotic manipulation, and 2)~these algorithms retain, and often improve, knowledge that enables them to solve tasks learned in the past. The dissertation extended one lifelong compositional RL algorithm to the nonstationary setting, where the distribution over tasks varies over time, and found that modularity permits individually tracking changes to different elements in the environment. The final contribution of this dissertation was a new benchmark for evaluating approaches to compositional RL, which exposed that existing methods struggle to discover the compositional properties of the environment

    Autonomous Locomotive Robot Path Planning on the Basis of Machine Learning

    Get PDF
    Jak již plyne z názvu, tato disertační práce se zabývá plánováním cesty autonomního lokomočního robotu na základě strojového učení. Úkolem plánování cesty robotu je nalezení cesty z počáteční do cílové pozice bez kolize s překážkami tak, aby ohodnocení cesty bylo minimální. Autonomní robot je takový stroj, který je schopen vykonávat úkoly zcela samostatně i v prostředích s dynamickými změnami. Plánování cesty v dynamickém částečně známém prostředí je však obtížným problémem. Schopnost autonomního robotu přizpůsobovat svoje chování změnám prostředí může být zajištěna pomocí metod strojového učení. V souvislosti s plánováním cesty se z metod strojového učení uplatňují především případové usuzování, neuronové sítě, posilované učení, rojová inteligence a genetické algoritmy. Prvá část disertační práce seznamuje čtenáře se současným stavem výzkumu v oblasti plánování cesty. Přehled metod je věnován základním všesměrovým robotům i robotům, na které jsou kladena diferenciální omezení. V práci je navržena řada metod pro plánování cesty všesměrových robotů i robotů s diferenciálním omezením. Tyto navržené metody jsou založeny především na případovém usuzování a genetických algoritmech. Všechny navržené metody byly implementovány v simulačních aplikacích. Výsledky experimentů prováděných v těchto aplikacích jsou součástí této práce. U každého experimentu je proveden rozbor výsledků. Z experimentů plyne, že navržené metody jsou schopné konkurovat běžně používaným metodám, neboť ve většině případů dosahují lepších výsledků.As already clear from the title, this dissertation deals with autonomous locomotive robot path planning, based on machine learning. Robot path planning task is to find a path from initial to target position without collision with obstacles so that the cost of the path is minimized. Autonomous robot is such a machine which is able to perform tasks completely independently even in environments with dynamic changes. Path planning in dynamic partially known environment is a difficult problem. Autonomous robot ability to adapt its behavior to changes in the environment can be ensured by using machine learning methods. In the field of path planning the mostly used methods of machine learning are case based reasoning, neural networks, reinforcement learning, swarm intelligence and genetic algorithms. The first part of this thesis introduces the current state of research in the field of path planning. Overview of methods is focused on basic omnidirectional robots and robots with differential constraints. In the thesis, several methods of path planning for omnidirectional robot and robot with differential constraints are proposed. These methods are mainly based on case-based reasoning and genetic algorithms. All proposed methods were implemented in simulation applications. Results of experiments carried out in these applications are part of this work. For each experiment, the results are analyzed. The experiments show that the proposed methods are able to compete with commonly used methods, because they perform better in most cases.

    Learning to reason over scene graphs: a case study of finetuning GPT-2 into a robot language model for grounded task planning

    Get PDF
    Long-horizon task planning is essential for the development of intelligent assistive and service robots. In this work, we investigate the applicability of a smaller class of large language models (LLMs), specifically GPT-2, in robotic task planning by learning to decompose tasks into subgoal specifications for a planner to execute sequentially. Our method grounds the input of the LLM on the domain that is represented as a scene graph, enabling it to translate human requests into executable robot plans, thereby learning to reason over long-horizon tasks, as encountered in the ALFRED benchmark. We compare our approach with classical planning and baseline methods to examine the applicability and generalizability of LLM-based planners. Our findings suggest that the knowledge stored in an LLM can be effectively grounded to perform long-horizon task planning, demonstrating the promising potential for the future application of neuro-symbolic planning methods in robotics

    Computer Science and Technology Series : XV Argentine Congress of Computer Science. Selected papers

    Get PDF
    CACIC'09 was the fifteenth Congress in the CACIC series. It was organized by the School of Engineering of the National University of Jujuy. The Congress included 9 Workshops with 130 accepted papers, 1 main Conference, 4 invited tutorials, different meetings related with Computer Science Education (Professors, PhD students, Curricula) and an International School with 5 courses. CACIC 2009 was organized following the traditional Congress format, with 9 Workshops covering a diversity of dimensions of Computer Science Research. Each topic was supervised by a committee of three chairs of different Universities. The call for papers attracted a total of 267 submissions. An average of 2.7 review reports were collected for each paper, for a grand total of 720 review reports that involved about 300 different reviewers. A total of 130 full papers were accepted and 20 of them were selected for this book.Red de Universidades con Carreras en Informática (RedUNCI
    corecore