42 research outputs found
Reinforcement Learning with Parameterized Actions
We introduce a model-free algorithm for learning in Markov decision processes
with parameterized actions-discrete actions with continuous parameters. At each
step the agent must select both which action to use and which parameters to use
with that action. We introduce the Q-PAMDP algorithm for learning in these
domains, show that it converges to a local optimum, and compare it to direct
policy search in the goal-scoring and Platform domains.Comment: Accepted for AAAI 201
Hidden Parameter Markov Decision Processes: A Semiparametric Regression Approach for Discovering Latent Task Parametrizations
Control applications often feature tasks with similar, but not identical,
dynamics. We introduce the Hidden Parameter Markov Decision Process (HiP-MDP),
a framework that parametrizes a family of related dynamical systems with a
low-dimensional set of latent factors, and introduce a semiparametric
regression approach for learning its structure from data. In the control
setting, we show that a learned HiP-MDP rapidly identifies the dynamics of a
new task instance, allowing an agent to flexibly adapt to task variations