35 research outputs found
Two Timescale Convergent Q-learning for Sleep--Scheduling in Wireless Sensor Networks
In this paper, we consider an intrusion detection application for Wireless
Sensor Networks (WSNs). We study the problem of scheduling the sleep times of
the individual sensors to maximize the network lifetime while keeping the
tracking error to a minimum. We formulate this problem as a
partially-observable Markov decision process (POMDP) with continuous
state-action spaces, in a manner similar to (Fuemmeler and Veeravalli [2008]).
However, unlike their formulation, we consider infinite horizon discounted and
average cost objectives as performance criteria. For each criterion, we propose
a convergent on-policy Q-learning algorithm that operates on two timescales,
while employing function approximation to handle the curse of dimensionality
associated with the underlying POMDP. Our proposed algorithm incorporates a
policy gradient update using a one-simulation simultaneous perturbation
stochastic approximation (SPSA) estimate on the faster timescale, while the
Q-value parameter (arising from a linear function approximation for the
Q-values) is updated in an on-policy temporal difference (TD) algorithm-like
fashion on the slower timescale. The feature selection scheme employed in each
of our algorithms manages the energy and tracking components in a manner that
assists the search for the optimal sleep-scheduling policy. For the sake of
comparison, in both discounted and average settings, we also develop a function
approximation analogue of the Q-learning algorithm. This algorithm, unlike the
two-timescale variant, does not possess theoretical convergence guarantees.
Finally, we also adapt our algorithms to include a stochastic iterative
estimation scheme for the intruder's mobility model. Our simulation results on
a 2-dimensional network setting suggest that our algorithms result in better
tracking accuracy at the cost of only a few additional sensors, in comparison
to a recent prior work
Delay Optimal Event Detection on Ad Hoc Wireless Sensor Networks
We consider a small extent sensor network for event detection, in which nodes
take samples periodically and then contend over a {\em random access network}
to transmit their measurement packets to the fusion center. We consider two
procedures at the fusion center to process the measurements. The Bayesian
setting is assumed; i.e., the fusion center has a prior distribution on the
change time. In the first procedure, the decision algorithm at the fusion
center is \emph{network-oblivious} and makes a decision only when a complete
vector of measurements taken at a sampling instant is available. In the second
procedure, the decision algorithm at the fusion center is \emph{network-aware}
and processes measurements as they arrive, but in a time causal order. In this
case, the decision statistic depends on the network delays as well, whereas in
the network-oblivious case, the decision statistic does not depend on the
network delays. This yields a Bayesian change detection problem with a tradeoff
between the random network delay and the decision delay; a higher sampling rate
reduces the decision delay but increases the random access delay. Under
periodic sampling, in the network--oblivious case, the structure of the optimal
stopping rule is the same as that without the network, and the optimal change
detection delay decouples into the network delay and the optimal decision delay
without the network. In the network--aware case, the optimal stopping problem
is analysed as a partially observable Markov decision process, in which the
states of the queues and delays in the network need to be maintained. A
sufficient statistic for decision is found to be the network-state and the
posterior probability of change having occurred given the measurements received
and the state of the network. The optimal regimes are studied using simulation.Comment: To appear in ACM Transactions on Sensor Networks. A part of this work
was presented in IEEE SECON 2006, and Allerton 201