82 research outputs found
Value-Function Approximations for Partially Observable Markov Decision Processes
Partially observable Markov decision processes (POMDPs) provide an elegant
mathematical framework for modeling complex decision and planning problems in
stochastic domains in which states of the system are observable only
indirectly, via a set of imperfect or noisy observations. The modeling
advantage of POMDPs, however, comes at a price -- exact methods for solving
them are computationally very expensive and thus applicable in practice only to
very simple problems. We focus on efficient approximation (heuristic) methods
that attempt to alleviate the computational problem and trade off accuracy for
speed. We have two objectives here. First, we survey various approximation
methods, analyze their properties and relations and provide some new insights
into their differences. Second, we present a number of new approximation
methods and novel refinements of existing techniques. The theoretical results
are supported by experiments on a problem from the agent navigation domain
Logic and model checking for hidden Markov models
The branching-time temporal logic PCTL* has been introduced to specify quantitative properties over probability systems, such as discrete-time Markov chains. Until now, however, no logics have been defined to specify properties over hidden Markov models (HMMs). In HMMs the states are hidden, and the hidden processes produce a sequence of observations. In this paper we extend the logic PCTL* to POCTL*. With our logic one can state properties such as "there is at least a 90 percent probability that the model produces a given sequence of observations" over HMMs. Subsequently, we give model checking algorithms for POCTL* over HMMs
Hilbert Space Embeddings of POMDPs
A nonparametric approach for policy learning for POMDPs is proposed. The
approach represents distributions over the states, observations, and actions as
embeddings in feature spaces, which are reproducing kernel Hilbert spaces.
Distributions over states given the observations are obtained by applying the
kernel Bayes' rule to these distribution embeddings. Policies and value
functions are defined on the feature space over states, which leads to a
feature space expression for the Bellman equation. Value iteration may then be
used to estimate the optimal value function and associated policy. Experimental
results confirm that the correct policy is learned using the feature space
representation.Comment: Appears in Proceedings of the Twenty-Eighth Conference on Uncertainty
in Artificial Intelligence (UAI2012
Control Theory Meets POMDPs: A Hybrid Systems Approach
Partially observable Markov decision processes(POMDPs) provide a modeling framework for a variety of sequential decision making under uncertainty scenarios in artificial intelligence (AI). Since the states are not directly observable ina POMDP, decision making has to be performed based on the output of a Bayesian filter (continuous beliefs); hence, making POMDPs intractable to solve and analyze. To overcome the complexity challenge of POMDPs, we apply techniques from control theory. Our contributions are fourfold: (i) We begin by casting the problem of analyzing a POMDP into analyzing the behavior of a discrete-time switched system. Then, (ii) in order to estimate the reachable belief space of a POMDP, i.e., the set of all possible evolutions given an initial belief distribution over the states and a set of actions and observations, we find over-approximations in terms of sub-level sets of Lyapunov-like functions. Furthermore, (iii) in order to verify safety and performance requirements of a given POMDP, we formulate a barrier certificate theorem
- …