24 research outputs found
Online Meta-Critic Learning for Off-Policy Actor-Critic Methods
Off-Policy Actor-Critic (Off-PAC) methods have proven successful in a variety
of continuous control tasks. Normally, the critic's action-value function is
updated using temporal-difference, and the critic in turn provides a loss for
the actor that trains it to take actions with higher expected return. In this
paper, we introduce a novel and flexible meta-critic that observes the learning
process and meta-learns an additional loss for the actor that accelerates and
improves actor-critic learning. Compared to the vanilla critic, the meta-critic
network is explicitly trained to accelerate the learning process; and compared
to existing meta-learning algorithms, meta-critic is rapidly learned online for
a single task, rather than slowly over a family of tasks. Crucially, our
meta-critic framework is designed for off-policy based learners, which
currently provide state-of-the-art reinforcement learning sample efficiency. We
demonstrate that online meta-critic learning leads to improvements in avariety
of continuous control environments when combined with contemporary Off-PAC
methods DDPG, TD3 and the state-of-the-art SAC.Comment: NeurIPS 202
Discovering Object-Centric Generalized Value Functions From Pixels
Deep Reinforcement Learning has shown significant progress in extracting
useful representations from high-dimensional inputs albeit using hand-crafted
auxiliary tasks and pseudo rewards. Automatically learning such representations
in an object-centric manner geared towards control and fast adaptation remains
an open research problem. In this paper, we introduce a method that tries to
discover meaningful features from objects, translating them to temporally
coherent "question" functions and leveraging the subsequent learned general
value functions for control. We compare our approach with state-of-the-art
techniques alongside other ablations and show competitive performance in both
stationary and non-stationary settings. Finally, we also investigate the
discovered general value functions and through qualitative analysis show that
the learned representations are not only interpretable but also, centered
around objects that are invariant to changes across tasks facilitating fast
adaptation.Comment: Accepted at ICML 202