92 research outputs found
Discretizing Continuous Action Space for On-Policy Optimization
In this work, we show that discretizing action space for continuous control
is a simple yet powerful technique for on-policy optimization. The explosion in
the number of discrete actions can be efficiently addressed by a policy with
factorized distribution across action dimensions. We show that the discrete
policy achieves significant performance gains with state-of-the-art on-policy
optimization algorithms (PPO, TRPO, ACKTR) especially on high-dimensional tasks
with complex dynamics. Additionally, we show that an ordinal parameterization
of the discrete distribution can introduce the inductive bias that encodes the
natural ordering between discrete actions. This ordinal architecture further
significantly improves the performance of PPO/TRPO.Comment: Accepted at AAAI Conference on Artificial Intelligence (2020) in New
York, NY, USA. An open source implementation can be found at
https://github.com/robintyh1/onpolicybaseline
Towards a Better Understanding of Representation Dynamics under TD-learning
TD-learning is a foundation reinforcement learning (RL) algorithm for value
prediction. Critical to the accuracy of value predictions is the quality of
state representations. In this work, we consider the question: how does
end-to-end TD-learning impact the representation over time? Complementary to
prior work, we provide a set of analysis that sheds further light on the
representation dynamics under TD-learning. We first show that when the
environments are reversible, end-to-end TD-learning strictly decreases the
value approximation error over time. Under further assumptions on the
environments, we can connect the representation dynamics with spectral
decomposition over the transition matrix. This latter finding establishes
fitting multiple value functions from randomly generated rewards as a useful
auxiliary task for representation learning, as we empirically validate on both
tabular and Atari game suites
- …