43 research outputs found
Addressing Action Oscillations through Learning Policy Inertia
Deep reinforcement learning (DRL) algorithms have been demonstrated to be
effective in a wide range of challenging decision making and control tasks.
However, these methods typically suffer from severe action oscillations in
particular in discrete action setting, which means that agents select different
actions within consecutive steps even though states only slightly differ. This
issue is often neglected since the policy is usually evaluated by its
cumulative rewards only. Action oscillation strongly affects the user
experience and can even cause serious potential security menace especially in
real-world domains with the main concern of safety, such as autonomous driving.
To this end, we introduce Policy Inertia Controller (PIC) which serves as a
generic plug-in framework to off-the-shelf DRL algorithms, to enables adaptive
trade-off between the optimality and smoothness of the learned policy in a
formal way. We propose Nested Policy Iteration as a general training algorithm
for PIC-augmented policy which ensures monotonically non-decreasing updates
under some mild conditions. Further, we derive a practical DRL algorithm,
namely Nested Soft Actor-Critic. Experiments on a collection of autonomous
driving tasks and several Atari games suggest that our approach demonstrates
substantial oscillation reduction in comparison to a range of commonly adopted
baselines with almost no performance degradation.Comment: Accepted paper on AAAI 202
MGHRL: Meta Goal-generation for Hierarchical Reinforcement Learning
Most meta reinforcement learning (meta-RL) methods learn to adapt to new
tasks by directly optimizing the parameters of policies over primitive action
space. Such algorithms work well in tasks with relatively slight difference.
However, when the task distribution becomes wider, it would be quite
inefficient to directly learn such a meta-policy. In this paper, we propose a
new meta-RL algorithm called Meta Goal-generation for Hierarchical RL (MGHRL).
Instead of directly generating policies over primitive action space for new
tasks, MGHRL learns to generate high-level meta strategies over subgoals given
past experience and leaves the rest of how to achieve subgoals as independent
RL subtasks. Our empirical results on several challenging simulated robotics
environments show that our method enables more efficient and generalized
meta-learning from past experience.Comment: Accepted to the ICLR 2020 workshop: Beyond tabula rasa in RL
(BeTR-RL
Efficient Deep Reinforcement Learning via Adaptive Policy Transfer
Transfer Learning (TL) has shown great potential to accelerate Reinforcement
Learning (RL) by leveraging prior knowledge from past learned policies of
relevant tasks. Existing transfer approaches either explicitly computes the
similarity between tasks or select appropriate source policies to provide
guided explorations for the target task. However, how to directly optimize the
target policy by alternatively utilizing knowledge from appropriate source
policies without explicitly measuring the similarity is currently missing. In
this paper, we propose a novel Policy Transfer Framework (PTF) to accelerate RL
by taking advantage of this idea. Our framework learns when and which source
policy is the best to reuse for the target policy and when to terminate it by
modeling multi-policy transfer as the option learning problem. PTF can be
easily combined with existing deep RL approaches. Experimental results show it
significantly accelerates the learning process and surpasses state-of-the-art
policy transfer methods in terms of learning efficiency and final performance
in both discrete and continuous action spaces.Comment: Accepted by IJCAI'202
Towards Effective Context for Meta-Reinforcement Learning: an Approach based on Contrastive Learning
Context, the embedding of previous collected trajectories, is a powerful
construct for Meta-Reinforcement Learning (Meta-RL) algorithms. By conditioning
on an effective context, Meta-RL policies can easily generalize to new tasks
within a few adaptation steps. We argue that improving the quality of context
involves answering two questions: 1. How to train a compact and sufficient
encoder that can embed the task-specific information contained in prior
trajectories? 2. How to collect informative trajectories of which the
corresponding context reflects the specification of tasks? To this end, we
propose a novel Meta-RL framework called CCM (Contrastive learning augmented
Context-based Meta-RL). We first focus on the contrastive nature behind
different tasks and leverage it to train a compact and sufficient context
encoder. Further, we train a separate exploration policy and theoretically
derive a new information-gain-based objective which aims to collect informative
trajectories in a few steps. Empirically, we evaluate our approaches on common
benchmarks as well as several complex sparse-reward environments. The
experimental results show that CCM outperforms state-of-the-art algorithms by
addressing previously mentioned problems respectively.Comment: Accepted to AAAI 202
Neighborhood Cognition Consistent Multi-Agent Reinforcement Learning
Social psychology and real experiences show that cognitive consistency plays
an important role to keep human society in order: if people have a more
consistent cognition about their environments, they are more likely to achieve
better cooperation. Meanwhile, only cognitive consistency within a neighborhood
matters because humans only interact directly with their neighbors. Inspired by
these observations, we take the first step to introduce \emph{neighborhood
cognitive consistency} (NCC) into multi-agent reinforcement learning (MARL).
Our NCC design is quite general and can be easily combined with existing MARL
methods. As examples, we propose neighborhood cognition consistent deep
Q-learning and Actor-Critic to facilitate large-scale multi-agent cooperations.
Extensive experiments on several challenging tasks (i.e., packet routing, wifi
configuration, and Google football player control) justify the superior
performance of our methods compared with state-of-the-art MARL approaches.Comment: Accepted by AAAI2020 with oral presentation
(https://aaai.org/Conferences/AAAI-20/wp-content/uploads/2020/01/AAAI-20-Accepted-Paper-List.pdf).
Since AAAI2020 has started, I have the right to distribute this paper on
arXi