20,335 research outputs found
Generative Exploration and Exploitation
Sparse reward is one of the biggest challenges in reinforcement learning
(RL). In this paper, we propose a novel method called Generative Exploration
and Exploitation (GENE) to overcome sparse reward. GENE automatically generates
start states to encourage the agent to explore the environment and to exploit
received reward signals. GENE can adaptively tradeoff between exploration and
exploitation according to the varying distributions of states experienced by
the agent as the learning progresses. GENE relies on no prior knowledge about
the environment and can be combined with any RL algorithm, no matter on-policy
or off-policy, single-agent or multi-agent. Empirically, we demonstrate that
GENE significantly outperforms existing methods in three tasks with only binary
rewards, including Maze, Maze Ant, and Cooperative Navigation. Ablation studies
verify the emergence of progressive exploration and automatic reversing.Comment: AAAI'2
Lifelong Federated Reinforcement Learning: A Learning Architecture for Navigation in Cloud Robotic Systems
This paper was motivated by the problem of how to make robots fuse and
transfer their experience so that they can effectively use prior knowledge and
quickly adapt to new environments. To address the problem, we present a
learning architecture for navigation in cloud robotic systems: Lifelong
Federated Reinforcement Learning (LFRL). In the work, We propose a knowledge
fusion algorithm for upgrading a shared model deployed on the cloud. Then,
effective transfer learning methods in LFRL are introduced. LFRL is consistent
with human cognitive science and fits well in cloud robotic systems.
Experiments show that LFRL greatly improves the efficiency of reinforcement
learning for robot navigation. The cloud robotic system deployment also shows
that LFRL is capable of fusing prior knowledge. In addition, we release a cloud
robotic navigation-learning website based on LFRL
PRM-RL: Long-range Robotic Navigation Tasks by Combining Reinforcement Learning and Sampling-based Planning
We present PRM-RL, a hierarchical method for long-range navigation task
completion that combines sampling based path planning with reinforcement
learning (RL). The RL agents learn short-range, point-to-point navigation
policies that capture robot dynamics and task constraints without knowledge of
the large-scale topology. Next, the sampling-based planners provide roadmaps
which connect robot configurations that can be successfully navigated by the RL
agent. The same RL agents are used to control the robot under the direction of
the planning, enabling long-range navigation. We use the Probabilistic Roadmaps
(PRMs) for the sampling-based planner. The RL agents are constructed using
feature-based and deep neural net policies in continuous state and action
spaces. We evaluate PRM-RL, both in simulation and on-robot, on two navigation
tasks with non-trivial robot dynamics: end-to-end differential drive indoor
navigation in office environments, and aerial cargo delivery in urban
environments with load displacement constraints. Our results show improvement
in task completion over both RL agents on their own and traditional
sampling-based planners. In the indoor navigation task, PRM-RL successfully
completes up to 215 m long trajectories under noisy sensor conditions, and the
aerial cargo delivery completes flights over 1000 m without violating the task
constraints in an environment 63 million times larger than used in training.Comment: 9 pages, 7 figure
- …