2 research outputs found

    The Complexity of POMDPs with Long-run Average Objectives

    Full text link
    We study the problem of approximation of optimal values in partially-observable Markov decision processes (POMDPs) with long-run average objectives. POMDPs are a standard model for dynamic systems with probabilistic and nondeterministic behavior in uncertain environments. In long-run average objectives rewards are associated with every transition of the POMDP and the payoff is the long-run average of the rewards along the executions of the POMDP. We establish strategy complexity and computational complexity results. Our main result shows that finite-memory strategies suffice for approximation of optimal values, and the related decision problem is recursively enumerable complete

    Strong uniform value in gambling houses and partially observable Markov decision processes

    No full text
    In several standard models of dynamic programming (gambling houses, Markov decision processes (MDPs), Partially observable MDPs (POMDPs), we prove the existence of a robust notion of value for the infinitely repeated problem, namely, the strong uniform value. This solves two open problems. First, this shows that for any \u3b5 > 0, the decision maker has a pure strategy a which is \u3b5-optimal in any n-stage problem, provided that n is big enough (this result was only known for behavior strategies, that is, strategies which use ran domization). Second, for any \u3b5 > 0, the decision-maker can guarantee the limit of the n-stage value minus \u3b5 in the infinite problem, where the payoff is the expectation of the inferior limit of the time average payoff
    corecore