414,916 research outputs found
Nonapproximability Results for Partially Observable Markov Decision Processes
We show that for several variations of partially observable Markov decision
processes, polynomial-time algorithms for finding control policies are unlikely
to or simply don't have guarantees of finding policies within a constant factor
or a constant summand of optimal. Here "unlikely" means "unless some complexity
classes collapse," where the collapses considered are P=NP, P=PSPACE, or P=EXP.
Until or unless these collapses are shown to hold, any control-policy designer
must choose between such performance guarantees and efficient computation
Regret Minimization in Partially Observable Linear Quadratic Control
We study the problem of regret minimization in partially observable linear quadratic control systems when the model dynamics are unknown a priori. We propose ExpCommit, an explore-then-commit algorithm that learns the model Markov parameters and then follows the principle of optimism in the face of uncertainty to design a controller. We propose a novel way to decompose the regret and provide an end-to-end sublinear regret upper bound for partially observable linear quadratic control. Finally, we provide stability guarantees and establish a regret upper bound of O(T^(2/3)) for ExpCommit, where T is the time horizon of the problem
Qualitative Analysis of Partially-observable Markov Decision Processes
We study observation-based strategies for partially-observable Markov
decision processes (POMDPs) with omega-regular objectives. An observation-based
strategy relies on partial information about the history of a play, namely, on
the past sequence of observations. We consider the qualitative analysis
problem: given a POMDP with an omega-regular objective, whether there is an
observation-based strategy to achieve the objective with probability~1
(almost-sure winning), or with positive probability (positive winning). Our
main results are twofold. First, we present a complete picture of the
computational complexity of the qualitative analysis of POMDP s with parity
objectives (a canonical form to express omega-regular objectives) and its
subclasses. Our contribution consists in establishing several upper and lower
bounds that were not known in literature. Second, we present optimal bounds
(matching upper and lower bounds) on the memory required by pure and randomized
observation-based strategies for the qualitative analysis of POMDP s with
parity objectives and its subclasses
- …
