455 research outputs found
The Complexity of POMDPs with Long-run Average Objectives
We study the problem of approximation of optimal values in
partially-observable Markov decision processes (POMDPs) with long-run average
objectives. POMDPs are a standard model for dynamic systems with probabilistic
and nondeterministic behavior in uncertain environments. In long-run average
objectives rewards are associated with every transition of the POMDP and the
payoff is the long-run average of the rewards along the executions of the
POMDP. We establish strategy complexity and computational complexity results.
Our main result shows that finite-memory strategies suffice for approximation
of optimal values, and the related decision problem is recursively enumerable
complete
Hidden Parameter Markov Decision Processes: A Semiparametric Regression Approach for Discovering Latent Task Parametrizations
Control applications often feature tasks with similar, but not identical,
dynamics. We introduce the Hidden Parameter Markov Decision Process (HiP-MDP),
a framework that parametrizes a family of related dynamical systems with a
low-dimensional set of latent factors, and introduce a semiparametric
regression approach for learning its structure from data. In the control
setting, we show that a learned HiP-MDP rapidly identifies the dynamics of a
new task instance, allowing an agent to flexibly adapt to task variations
- …