We consider sensor scheduling as the optimal observability problem for
partially observable Markov decision processes (POMDP). This model fits to the
cases where a Markov process is observed by a single sensor which needs to be
dynamically adjusted or by a set of sensors which are selected one at a time in
a way that maximizes the information acquisition from the process. Similar to
conventional POMDP problems, in this model the control action is based on all
past measurements; however here this action is not for the control of state
process, which is autonomous, but it is for influencing the measurement of that
process. This POMDP is a controlled version of the hidden Markov process, and
we show that its optimal observability problem can be formulated as an average
cost Markov decision process (MDP) scheduling problem. In this problem, a
policy is a rule for selecting sensors or adjusting the measuring device based
on the measurement history. Given a policy, we can evaluate the estimation
entropy for the joint state-measurement processes which inversely measures the
observability of state process for that policy. Considering estimation entropy
as the cost of a policy, we show that the problem of finding optimal policy is
equivalent to an average cost MDP scheduling problem where the cost function is
the entropy function over the belief space. This allows the application of the
policy iteration algorithm for finding the policy achieving minimum estimation
entropy, thus optimum observability.Comment: 5 pages, submitted to 2007 IEEE PerCom/PerSeNS conferenc