511,470 research outputs found
Multi-Task Policy Search for Robotics
© 2014 IEEE.Learning policies that generalize across multiple tasks is an important and challenging research topic in reinforcement learning and robotics. Training individual policies for every single potential task is often impractical, especially for continuous task variations, requiring more principled approaches to share and transfer knowledge among similar tasks. We present a novel approach for learning a nonlinear feedback policy that generalizes across multiple tasks. The key idea is to define a parametrized policy as a function of both the state and the task, which allows learning a single policy that generalizes across multiple known and unknown tasks. Applications of our novel approach to reinforcement and imitation learning in realrobot experiments are shown
Multi-Task Policy Search
Learning policies that generalize across multiple tasks is an important and challenging research topic in reinforcement learning and robotics. Training individual policies for every single potential task is often impractical, especially for continuous task variations, requiring more principled approaches to share and transfer knowledge among similar tasks. We present a novel approach for learning a nonlinear feedback policy that generalizes across multiple tasks. The key idea is to define a parametrized policy as a function of both the state and the task, which allows learning a single policy that generalizes across multiple known and unknown tasks. Applications of our novel approach to reinforcement and imitation learning in real-robot experiments are shown
Large Language Models as Commonsense Knowledge for Large-Scale Task Planning
Large-scale task planning is a major challenge. Recent work exploits large
language models (LLMs) directly as a policy and shows surprisingly interesting
results. This paper shows that LLMs provide a commonsense model of the world in
addition to a policy that acts on it. The world model and the policy can be
combined in a search algorithm, such as Monte Carlo Tree Search (MCTS), to
scale up task planning. In our new LLM-MCTS algorithm, the LLM-induced world
model provides a commonsense prior belief for MCTS to achieve effective
reasoning; the LLM-induced policy acts as a heuristic to guide the search,
vastly improving search efficiency. Experiments show that LLM-MCTS outperforms
both MCTS alone and policies induced by LLMs (GPT2 and GPT3.5) by a wide
margin, for complex, novel tasks. Further experiments and analyses on multiple
tasks -- multiplication, multi-hop travel planning, object rearrangement --
suggest minimum description length (MDL) as a general guiding principle: if the
description length of the world model is substantially smaller than that of the
policy, using LLM as a world model for model-based planning is likely better
than using LLM solely as a policy.Comment: In Proceedings of NeurIPS 202
Neuro-Evolution for Multi-Agent Policy Transfer in RoboCup Keep-Away
An objective of transfer learning is to improve and speedup learning on target tasks after training on a different, but related source tasks. This research is a study of comparative Neuro-Evolution (NE) methods for transferring evolved multi-agent policies (behaviors) between multi-agent tasks of varying complexity. The efficacy of five variants of two NE methods are compared for multi-agent policy transfer. The NE method variants include using the original versions (search directed by a fitness function), behavioural and genotypic diversity based search to replace objective based search (fitness functions) as well as hybrid objective and diversity (behavioral and genotypic) maintenance based search approaches. The goal of testing these variants to direct policy search is to ascertain an appropriate method for boosting the task performance of transferred multi-agent behaviours. Results indicate that an indirect encoding NE method using hybridized objective based search and behavioral diversity maintenance yields significantly improved task performance for policy transfer between multi-agent tasks of increasing complexity. Comparatively, NE methods not using behavioral diversity maintenance to direct policy search performed relatively poorly in terms of efficiency (evolution times) and quality of solutions in target tasks
A Distributed Cooperative Dynamic Task Planning Algorithm for Multiple Satellites Based on Multi-agent Hybrid Learning
AbstractTraditionally, heuristic re-planning algorithms are used to tackle the problem of dynamic task planning for multiple satellites. However, the traditional heuristic strategies depend on the concrete tasks, which often affect the result's optimality. Noticing that the historical information of cooperative task planning will impact the latter planning results, we propose a hybrid learning algorithm for dynamic multi-satellite task planning, which is based on the multi-agent reinforcement learning of policy iteration and the transfer learning. The reinforcement learning strategy of each satellite is described with neural networks. The policy neural network individuals with the best topological structure and weights are found by applying co-evolutionary search iteratively. To avoid the failure of the historical learning caused by the randomly occurring observation requests, a novel approach is proposed to balance the quality and efficiency of the task planning, which converts the historical learning strategy to the current initial learning strategy by applying the transfer learning algorithm. The simulations and analysis show the feasibility and adaptability of the proposed approach especially for the situation with randomly occurring observation requests
Empirical Evaluation of Contextual Policy Search with a Comparison-based Surrogate Model and Active Covariance Matrix Adaptation
Contextual policy search (CPS) is a class of multi-task reinforcement
learning algorithms that is particularly useful for robotic applications. A
recent state-of-the-art method is Contextual Covariance Matrix Adaptation
Evolution Strategies (C-CMA-ES). It is based on the standard black-box
optimization algorithm CMA-ES. There are two useful extensions of CMA-ES that
we will transfer to C-CMA-ES and evaluate empirically: ACM-ES, which uses a
comparison-based surrogate model, and aCMA-ES, which uses an active update of
the covariance matrix. We will show that improvements with these methods can be
impressive in terms of sample-efficiency, although this is not relevant any
more for the robotic domain.Comment: Supplementary material for poster paper accepted at GECCO 2019;
https://doi.org/10.1145/3319619.332193
- …