188 research outputs found
Accelerating Reinforcement Learning by Composing Solutions of Automatically Identified Subtasks
This paper discusses a system that accelerates reinforcement learning by
using transfer from related tasks. Without such transfer, even if two tasks are
very similar at some abstract level, an extensive re-learning effort is
required. The system achieves much of its power by transferring parts of
previously learned solutions rather than a single complete solution. The system
exploits strong features in the multi-dimensional function produced by
reinforcement learning in solving a particular task. These features are stable
and easy to recognize early in the learning process. They generate a
partitioning of the state space and thus the function. The partition is
represented as a graph. This is used to index and compose functions stored in a
case base to form a close approximation to the solution of the new task.
Experiments demonstrate that function composition often produces more than an
order of magnitude increase in learning rate compared to a basic reinforcement
learning algorithm
Hierarchical Reinforcement Learning with the MAXQ Value Function Decomposition
This paper presents the MAXQ approach to hierarchical reinforcement learning
based on decomposing the target Markov decision process (MDP) into a hierarchy
of smaller MDPs and decomposing the value function of the target MDP into an
additive combination of the value functions of the smaller MDPs. The paper
defines the MAXQ hierarchy, proves formal results on its representational
power, and establishes five conditions for the safe use of state abstractions.
The paper presents an online model-free learning algorithm, MAXQ-Q, and proves
that it converges wih probability 1 to a kind of locally-optimal policy known
as a recursively optimal policy, even in the presence of the five kinds of
state abstraction. The paper evaluates the MAXQ representation and MAXQ-Q
through a series of experiments in three domains and shows experimentally that
MAXQ-Q (with state abstractions) converges to a recursively optimal policy much
faster than flat Q learning. The fact that MAXQ learns a representation of the
value function has an important benefit: it makes it possible to compute and
execute an improved, non-hierarchical policy via a procedure similar to the
policy improvement step of policy iteration. The paper demonstrates the
effectiveness of this non-hierarchical execution experimentally. Finally, the
paper concludes with a comparison to related work and a discussion of the
design tradeoffs in hierarchical reinforcement learning.Comment: 63 pages, 15 figure
Augmented Language Models: a Survey
This survey reviews works in which language models (LMs) are augmented with
reasoning skills and the ability to use tools. The former is defined as
decomposing a potentially complex task into simpler subtasks while the latter
consists in calling external modules such as a code interpreter. LMs can
leverage these augmentations separately or in combination via heuristics, or
learn to do so from demonstrations. While adhering to a standard missing tokens
prediction objective, such augmented LMs can use various, possibly
non-parametric external modules to expand their context processing ability,
thus departing from the pure language modeling paradigm. We therefore refer to
them as Augmented Language Models (ALMs). The missing token objective allows
ALMs to learn to reason, use tools, and even act, while still performing
standard natural language tasks and even outperforming most regular LMs on
several benchmarks. In this work, after reviewing current advance in ALMs, we
conclude that this new research direction has the potential to address common
limitations of traditional LMs such as interpretability, consistency, and
scalability issues
Indirect Methods for Robot Skill Learning
Robot learning algorithms are appealing alternatives for acquiring rational robotic behaviors from data collected during the execution of tasks. Furthermore, most robot learning techniques are stated as isolated stages and focused on directly obtaining rational policies as a result of optimizing only performance measures of single tasks. However, formulating robotic skill acquisition processes in such a way have some disadvantages. For example, if the same skill has to be learned by different robots, independent learning processes should be carried out for acquiring exclusive policies for each robot. Similarly, if a robot has to learn diverse skills, the robot should acquire the policy for each task in separate learning processes, in a sequential order and commonly starting from scratch. In the same way, formulating the learning process in terms of only the performance measure, makes robots to unintentionally avoid situations that should not be repeated, but without any mechanism that captures the necessity of not repeating those wrong behaviors. In contrast, humans and other animals exploit their experience not only for improving the performance of the task they are currently executing, but for constructing indirectly multiple models to help them with that particular task and to generalize to new problems. Accordingly, the models and algorithms proposed in this thesis seek to be more data efficient and extract more information from the interaction data that is collected either from expert\u2019s demonstrations or the robot\u2019s own experience. The first approach encodes robotic skills with shared latent variable models, obtaining latent representations that can be transferred from one robot to others, therefore avoiding to learn the same task from scratch. The second approach learns complex rational policies by representing them as hierarchical models that can perform multiple concurrent tasks, and whose components are learned in the same learning process, instead of separate processes. Finally, the third approach uses the interaction data for learning two alternative and antagonistic policies that capture what to and not to do, and which influence the learning process in addition to the performance measure defined for the task
Lifelong Machine Learning Of Functionally Compositional Structures
A hallmark of human intelligence is the ability to construct self-contained chunks of knowledge and reuse them in novel combinations for solving different yet structurally related problems. Learning such compositional structures has been a significant challenge for artificial systems, due to the underlying combinatorial search. To date, research into compositional learning has largely proceeded separately from work on lifelong or continual learning. This dissertation integrated these two lines of work to present a general-purpose framework for lifelong learning of functionally compositional structures. The framework separates the learning into two stages: learning how to best combine existing components to assimilate a novel problem, and learning how to adapt the set of existing components to accommodate the new problem. This separation explicitly handles the trade-off between the stability required to remember how to solve earlier tasks and the flexibility required to solve new tasks. This dissertation instantiated the framework into various supervised and reinforcement learning (RL) algorithms. Empirical evaluations on a range of supervised learning benchmarks compared the proposed algorithms against well-established techniques, and found that 1)~compositional models enable improved lifelong learning when the tasks are highly diverse by balancing the incorporation of new knowledge and the retention of past knowledge, 2)~the separation of the learning into stages permits lifelong learning of compositional knowledge, and 3)~the components learned by the proposed methods represent self-contained and reusable functions. Similar evaluations on existing and new RL benchmarks demonstrated that 1)~algorithms under the framework accelerate the discovery of high-performing policies in a variety of domains, including robotic manipulation, and 2)~these algorithms retain, and often improve, knowledge that enables them to solve tasks learned in the past. The dissertation extended one lifelong compositional RL algorithm to the nonstationary setting, where the distribution over tasks varies over time, and found that modularity permits individually tracking changes to different elements in the environment. The final contribution of this dissertation was a new benchmark for evaluating approaches to compositional RL, which exposed that existing methods struggle to discover the compositional properties of the environment
Autonomous Locomotive Robot Path Planning on the Basis of Machine Learning
Jak již plyne z názvu, tato disertační práce se zabývá plánováním cesty autonomního lokomočního robotu na základě strojového učení. Úkolem plánování cesty robotu je nalezení cesty z počáteční do cílové pozice bez kolize s překážkami tak, aby ohodnocení cesty bylo minimální. Autonomní robot je takový stroj, který je schopen vykonávat úkoly zcela samostatně i v prostředích s dynamickými změnami. Plánování cesty v dynamickém částečně známém prostředí je však obtížným problémem. Schopnost autonomního robotu přizpůsobovat svoje chování změnám prostředí může být zajištěna pomocí metod strojového učení. V souvislosti s plánováním cesty se z metod strojového učení uplatňují především případové usuzování, neuronové sítě, posilované učení, rojová inteligence a genetické algoritmy. Prvá část disertační práce seznamuje čtenáře se současným stavem výzkumu v oblasti plánování cesty. Přehled metod je věnován základním všesměrovým robotům i robotům, na které jsou kladena diferenciální omezení. V práci je navržena řada metod pro plánování cesty všesměrových robotů i robotů s diferenciálním omezením. Tyto navržené metody jsou založeny především na případovém usuzování a genetických algoritmech. Všechny navržené metody byly implementovány v simulačních aplikacích. Výsledky experimentů prováděných v těchto aplikacích jsou součástí této práce. U každého experimentu je proveden rozbor výsledků. Z experimentů plyne, že navržené metody jsou schopné konkurovat běžně používaným metodám, neboť ve většině případů dosahují lepších výsledků.As already clear from the title, this dissertation deals with autonomous locomotive robot path planning, based on machine learning. Robot path planning task is to find a path from initial to target position without collision with obstacles so that the cost of the path is minimized. Autonomous robot is such a machine which is able to perform tasks completely independently even in environments with dynamic changes. Path planning in dynamic partially known environment is a difficult problem. Autonomous robot ability to adapt its behavior to changes in the environment can be ensured by using machine learning methods. In the field of path planning the mostly used methods of machine learning are case based reasoning, neural networks, reinforcement learning, swarm intelligence and genetic algorithms. The first part of this thesis introduces the current state of research in the field of path planning. Overview of methods is focused on basic omnidirectional robots and robots with differential constraints. In the thesis, several methods of path planning for omnidirectional robot and robot with differential constraints are proposed. These methods are mainly based on case-based reasoning and genetic algorithms. All proposed methods were implemented in simulation applications. Results of experiments carried out in these applications are part of this work. For each experiment, the results are analyzed. The experiments show that the proposed methods are able to compete with commonly used methods, because they perform better in most cases.
Learning to reason over scene graphs: a case study of finetuning GPT-2 into a robot language model for grounded task planning
Long-horizon task planning is essential for the development of intelligent assistive and service robots. In this work, we investigate the applicability of a smaller class of large language models (LLMs), specifically GPT-2, in robotic task planning by learning to decompose tasks into subgoal specifications for a planner to execute sequentially. Our method grounds the input of the LLM on the domain that is represented as a scene graph, enabling it to translate human requests into executable robot plans, thereby learning to reason over long-horizon tasks, as encountered in the ALFRED benchmark. We compare our approach with classical planning and baseline methods to examine the applicability and generalizability of LLM-based planners. Our findings suggest that the knowledge stored in an LLM can be effectively grounded to perform long-horizon task planning, demonstrating the promising potential for the future application of neuro-symbolic planning methods in robotics
Computer Science and Technology Series : XV Argentine Congress of Computer Science. Selected papers
CACIC'09 was the fifteenth Congress in the CACIC series. It was organized by the School of Engineering of the National University of Jujuy. The Congress included 9 Workshops with 130 accepted papers, 1 main Conference, 4 invited tutorials, different meetings related with Computer Science Education (Professors, PhD students, Curricula) and an International School with 5 courses. CACIC 2009 was organized following the traditional Congress format, with 9 Workshops covering a diversity of dimensions of Computer Science Research. Each topic was supervised by a committee of three chairs of different Universities.
The call for papers attracted a total of 267 submissions. An average of 2.7 review reports were collected for each paper, for a grand total of 720 review reports that involved about 300 different reviewers.
A total of 130 full papers were accepted and 20 of them were selected for this book.Red de Universidades con Carreras en Informática (RedUNCI
- …