269 research outputs found
Partially Observable Total-Cost Markov Decision Processes with Weakly Continuous Transition Probabilities
This paper describes sufficient conditions for the existence of optimal policies for partially observable Markov decision processes (POMDPs) with Borel state, observation, and action sets, when the goal is to minimize the expected total costs over finite or infinite horizons. For infinite-horizon problems, one-step costs are either discounted or assumed to be nonnegative. Action sets may be noncompact and one-step cost functions may be unbounded. The introduced conditions are also sufficient for the validity of optimality equations, semicontinuity of value functions, and convergence of value iterations to optimal values. Since POMDPs can be reduced to completely observable Markov decision processes (COMDPs), whose states are posterior state distributions, this paper focuses on the validity of the above-mentioned optimality properties for COMDPs. The central question is whether the transition probabilities for the COMDP are weakly continuous. We introduce sufficient conditions for this and show that the transition probabilities for a COMDP are weakly continuous, if transition probabilities of the underlying Markov decision process are weakly continuous and observation probabilities for the POMDP are continuous in total variation. Moreover, the continuity in total variation of the observation probabilities cannot be weakened to setwise continuity. The results are illustrated with counterexamples and examples
Resource Allocation Among Agents with MDP-Induced Preferences
Allocating scarce resources among agents to maximize global utility is, in
general, computationally challenging. We focus on problems where resources
enable agents to execute actions in stochastic environments, modeled as Markov
decision processes (MDPs), such that the value of a resource bundle is defined
as the expected value of the optimal MDP policy realizable given these
resources. We present an algorithm that simultaneously solves the
resource-allocation and the policy-optimization problems. This allows us to
avoid explicitly representing utilities over exponentially many resource
bundles, leading to drastic (often exponential) reductions in computational
complexity. We then use this algorithm in the context of self-interested agents
to design a combinatorial auction for allocating resources. We empirically
demonstrate the effectiveness of our approach by showing that it can, in
minutes, optimally solve problems for which a straightforward combinatorial
resource-allocation technique would require the agents to enumerate up to 2^100
resource bundles and the auctioneer to solve an NP-complete problem with an
input of that size
- β¦