414,916 research outputs found

    Nonapproximability Results for Partially Observable Markov Decision Processes

    Full text link
    We show that for several variations of partially observable Markov decision processes, polynomial-time algorithms for finding control policies are unlikely to or simply don't have guarantees of finding policies within a constant factor or a constant summand of optimal. Here "unlikely" means "unless some complexity classes collapse," where the collapses considered are P=NP, P=PSPACE, or P=EXP. Until or unless these collapses are shown to hold, any control-policy designer must choose between such performance guarantees and efficient computation

    Regret Minimization in Partially Observable Linear Quadratic Control

    Get PDF
    We study the problem of regret minimization in partially observable linear quadratic control systems when the model dynamics are unknown a priori. We propose ExpCommit, an explore-then-commit algorithm that learns the model Markov parameters and then follows the principle of optimism in the face of uncertainty to design a controller. We propose a novel way to decompose the regret and provide an end-to-end sublinear regret upper bound for partially observable linear quadratic control. Finally, we provide stability guarantees and establish a regret upper bound of O(T^(2/3)) for ExpCommit, where T is the time horizon of the problem

    Qualitative Analysis of Partially-observable Markov Decision Processes

    Get PDF
    We study observation-based strategies for partially-observable Markov decision processes (POMDPs) with omega-regular objectives. An observation-based strategy relies on partial information about the history of a play, namely, on the past sequence of observations. We consider the qualitative analysis problem: given a POMDP with an omega-regular objective, whether there is an observation-based strategy to achieve the objective with probability~1 (almost-sure winning), or with positive probability (positive winning). Our main results are twofold. First, we present a complete picture of the computational complexity of the qualitative analysis of POMDP s with parity objectives (a canonical form to express omega-regular objectives) and its subclasses. Our contribution consists in establishing several upper and lower bounds that were not known in literature. Second, we present optimal bounds (matching upper and lower bounds) on the memory required by pure and randomized observation-based strategies for the qualitative analysis of POMDP s with parity objectives and its subclasses
    corecore