16 research outputs found
Intrinsic Language-Guided Exploration for Complex Long-Horizon Robotic Manipulation Tasks
Current reinforcement learning algorithms struggle in sparse and complex
environments, most notably in long-horizon manipulation tasks entailing a
plethora of different sequences. In this work, we propose the Intrinsically
Guided Exploration from Large Language Models (IGE-LLMs) framework. By
leveraging LLMs as an assistive intrinsic reward, IGE-LLMs guides the
exploratory process in reinforcement learning to address intricate long-horizon
with sparse rewards robotic manipulation tasks. We evaluate our framework and
related intrinsic learning methods in an environment challenged with
exploration, and a complex robotic manipulation task challenged by both
exploration and long-horizons. Results show IGE-LLMs (i) exhibit notably higher
performance over related intrinsic methods and the direct use of LLMs in
decision-making, (ii) can be combined and complement existing learning methods
highlighting its modularity, (iii) are fairly insensitive to different
intrinsic scaling parameters, and (iv) maintain robustness against increased
levels of uncertainty and horizons.Comment: 8 pages, 3 figure
Shared Experience Actor-Critic for Multi-Agent Reinforcement Learning
Exploration in multi-agent reinforcement learning is a challenging problem,
especially in environments with sparse rewards. We propose a general method for
efficient exploration by sharing experience amongst agents. Our proposed
algorithm, called Shared Experience Actor-Critic (SEAC), applies experience
sharing in an actor-critic framework. We evaluate SEAC in a collection of
sparse-reward multi-agent environments and find that it consistently
outperforms two baselines and two state-of-the-art algorithms by learning in
fewer steps and converging to higher returns. In some harder environments,
experience sharing makes the difference between learning to solve the task and
not learning at all.Comment: 34th Conference on Neural Information Processing Systems (NeurIPS
2020), Vancouver, Canad
Pareto Actor-Critic for Equilibrium Selection in Multi-Agent Reinforcement Learning
This work focuses on equilibrium selection in no-conflict multi-agent games,
where we specifically study the problem of selecting a Pareto-optimal
equilibrium among several existing equilibria. It has been shown that many
state-of-the-art multi-agent reinforcement learning (MARL) algorithms are prone
to converging to Pareto-dominated equilibria due to the uncertainty each agent
has about the policy of the other agents during training. To address
sub-optimal equilibrium selection, we propose Pareto Actor-Critic (Pareto-AC),
which is an actor-critic algorithm that utilises a simple property of
no-conflict games (a superset of cooperative games): the Pareto-optimal
equilibrium in a no-conflict game maximises the returns of all agents and
therefore is the preferred outcome for all agents. We evaluate Pareto-AC in a
diverse set of multi-agent games and show that it converges to higher episodic
returns compared to seven state-of-the-art MARL algorithms and that it
successfully converges to a Pareto-optimal equilibrium in a range of matrix
games. Finally, we propose PACDCG, a graph neural network extension of
Pareto-AC which is shown to efficiently scale in games with a large number of
agents.Comment: 20 pages, 12 figure
Learning Task Embeddings for Teamwork Adaptation in Multi-Agent Reinforcement Learning
Successful deployment of multi-agent reinforcement learning often requires
agents to adapt their behaviour. In this work, we discuss the problem of
teamwork adaptation in which a team of agents needs to adapt their policies to
solve novel tasks with limited fine-tuning. Motivated by the intuition that
agents need to be able to identify and distinguish tasks in order to adapt
their behaviour to the current task, we propose to learn multi-agent task
embeddings (MATE). These task embeddings are trained using an encoder-decoder
architecture optimised for reconstruction of the transition and reward
functions which uniquely identify tasks. We show that a team of agents is able
to adapt to novel tasks when provided with task embeddings. We propose three
MATE training paradigms: independent MATE, centralised MATE, and mixed MATE
which vary in the information used for the task encoding. We show that the
embeddings learned by MATE identify tasks and provide useful information which
agents leverage during adaptation to novel tasks.Comment: To be presented at the Seventh Workshop on Generalization in Planning
at the NeurIPS 2023 conferenc
Ask more, know better: Reinforce-Learned Prompt Questions for Decision Making with Large Language Models
Large language models (LLMs) demonstrate their promise in tackling
complicated practical challenges by combining action-based policies with chain
of thought (CoT) reasoning. Having high-quality prompts on hand, however, is
vital to the framework's effectiveness. Currently, these prompts are
handcrafted utilizing extensive human labor, resulting in CoT policies that
frequently fail to generalize. Human intervention is also required in order to
develop grounding functions that ensure low-level controllers appropriately
process CoT reasoning. In this paper, we take the first step towards a fully
integrated end-to-end framework for task-solving in real settings employing
complicated reasoning. To that purpose, we offer a new leader-follower bilevel
framework capable of learning to ask relevant questions (prompts) and
subsequently undertaking reasoning to guide the learning of actions to be
performed in an environment. A good prompt should make introspective revisions
based on historical findings, leading the CoT to consider the anticipated
goals. A prompt-generator policy has its own aim in our system, allowing it to
adapt to the action policy and automatically root the CoT process towards
outputs that lead to decisive, high-performing actions. Meanwhile, the action
policy is learning how to use the CoT outputs to take specific actions. Our
empirical data reveal that our system outperforms leading methods in agent
learning benchmarks such as Overcooked and FourRoom