Receiving knowledge, abiding by laws, and being aware of regulations are
common behaviors in human society. Bearing in mind that reinforcement learning
(RL) algorithms benefit from mimicking humanity, in this work, we propose that
an RL agent can act on external guidance in both its learning process and model
deployment, making the agent more socially acceptable. We introduce the
concept, Knowledge-Grounded RL (KGRL), with a formal definition that an agent
learns to follow external guidelines and develop its own policy. Moving towards
the goal of KGRL, we propose a novel actor model with an embedding-based
attention mechanism that can attend to either a learnable internal policy or
external knowledge. The proposed method is orthogonal to training algorithms,
and the external knowledge can be flexibly recomposed, rearranged, and reused
in both training and inference stages. Through experiments on tasks with
discrete and continuous action space, our KGRL agent is shown to be more sample
efficient and generalizable, and it has flexibly rearrangeable knowledge
embeddings and interpretable behaviors