35 research outputs found

    Two Timescale Convergent Q-learning for Sleep--Scheduling in Wireless Sensor Networks

    Full text link
    In this paper, we consider an intrusion detection application for Wireless Sensor Networks (WSNs). We study the problem of scheduling the sleep times of the individual sensors to maximize the network lifetime while keeping the tracking error to a minimum. We formulate this problem as a partially-observable Markov decision process (POMDP) with continuous state-action spaces, in a manner similar to (Fuemmeler and Veeravalli [2008]). However, unlike their formulation, we consider infinite horizon discounted and average cost objectives as performance criteria. For each criterion, we propose a convergent on-policy Q-learning algorithm that operates on two timescales, while employing function approximation to handle the curse of dimensionality associated with the underlying POMDP. Our proposed algorithm incorporates a policy gradient update using a one-simulation simultaneous perturbation stochastic approximation (SPSA) estimate on the faster timescale, while the Q-value parameter (arising from a linear function approximation for the Q-values) is updated in an on-policy temporal difference (TD) algorithm-like fashion on the slower timescale. The feature selection scheme employed in each of our algorithms manages the energy and tracking components in a manner that assists the search for the optimal sleep-scheduling policy. For the sake of comparison, in both discounted and average settings, we also develop a function approximation analogue of the Q-learning algorithm. This algorithm, unlike the two-timescale variant, does not possess theoretical convergence guarantees. Finally, we also adapt our algorithms to include a stochastic iterative estimation scheme for the intruder's mobility model. Our simulation results on a 2-dimensional network setting suggest that our algorithms result in better tracking accuracy at the cost of only a few additional sensors, in comparison to a recent prior work

    Delay Optimal Event Detection on Ad Hoc Wireless Sensor Networks

    Full text link
    We consider a small extent sensor network for event detection, in which nodes take samples periodically and then contend over a {\em random access network} to transmit their measurement packets to the fusion center. We consider two procedures at the fusion center to process the measurements. The Bayesian setting is assumed; i.e., the fusion center has a prior distribution on the change time. In the first procedure, the decision algorithm at the fusion center is \emph{network-oblivious} and makes a decision only when a complete vector of measurements taken at a sampling instant is available. In the second procedure, the decision algorithm at the fusion center is \emph{network-aware} and processes measurements as they arrive, but in a time causal order. In this case, the decision statistic depends on the network delays as well, whereas in the network-oblivious case, the decision statistic does not depend on the network delays. This yields a Bayesian change detection problem with a tradeoff between the random network delay and the decision delay; a higher sampling rate reduces the decision delay but increases the random access delay. Under periodic sampling, in the network--oblivious case, the structure of the optimal stopping rule is the same as that without the network, and the optimal change detection delay decouples into the network delay and the optimal decision delay without the network. In the network--aware case, the optimal stopping problem is analysed as a partially observable Markov decision process, in which the states of the queues and delays in the network need to be maintained. A sufficient statistic for decision is found to be the network-state and the posterior probability of change having occurred given the measurements received and the state of the network. The optimal regimes are studied using simulation.Comment: To appear in ACM Transactions on Sensor Networks. A part of this work was presented in IEEE SECON 2006, and Allerton 201
    corecore