1 research outputs found
MICo: Learning improved representations via sampling-based state similarity for Markov decision processes
We present a new behavioural distance over the state space of a Markov
decision process, and demonstrate the use of this distance as an effective
means of shaping the learnt representations of deep reinforcement learning
agents. While existing notions of state similarity are typically difficult to
learn at scale due to high computational cost and lack of sample-based
algorithms, our newly-proposed distance addresses both of these issues. In
addition to providing detailed theoretical analysis, we provide empirical
evidence that learning this distance alongside the value function yields
structured and informative representations, including strong results on the
Arcade Learning Environment benchmark