4,846 research outputs found
Markov Decision Processes with Applications in Wireless Sensor Networks: A Survey
Wireless sensor networks (WSNs) consist of autonomous and resource-limited
devices. The devices cooperate to monitor one or more physical phenomena within
an area of interest. WSNs operate as stochastic systems because of randomness
in the monitored environments. For long service time and low maintenance cost,
WSNs require adaptive and robust methods to address data exchange, topology
formulation, resource and power optimization, sensing coverage and object
detection, and security challenges. In these problems, sensor nodes are to make
optimized decisions from a set of accessible strategies to achieve design
goals. This survey reviews numerous applications of the Markov decision process
(MDP) framework, a powerful decision-making tool to develop adaptive algorithms
and protocols for WSNs. Furthermore, various solution methods are discussed and
compared to serve as a guide for using MDPs in WSNs
On Multi-Step Sensor Scheduling via Convex Optimization
Effective sensor scheduling requires the consideration of long-term effects
and thus optimization over long time horizons. Determining the optimal sensor
schedule, however, is equivalent to solving a binary integer program, which is
computationally demanding for long time horizons and many sensors. For linear
Gaussian systems, two efficient multi-step sensor scheduling approaches are
proposed in this paper. The first approach determines approximate but close to
optimal sensor schedules via convex optimization. The second approach combines
convex optimization with a \BB search for efficiently determining the optimal
sensor schedule.Comment: 6 pages, appeared in the proceedings of the 2nd International
Workshop on Cognitive Information Processing (CIP), Elba, Italy, June 201
Deep Reinforcement Learning for Wireless Sensor Scheduling in Cyber-Physical Systems
In many Cyber-Physical Systems, we encounter the problem of remote state
estimation of geographically distributed and remote physical processes. This
paper studies the scheduling of sensor transmissions to estimate the states of
multiple remote, dynamic processes. Information from the different sensors have
to be transmitted to a central gateway over a wireless network for monitoring
purposes, where typically fewer wireless channels are available than there are
processes to be monitored. For effective estimation at the gateway, the sensors
need to be scheduled appropriately, i.e., at each time instant one needs to
decide which sensors have network access and which ones do not. To address this
scheduling problem, we formulate an associated Markov decision process (MDP).
This MDP is then solved using a Deep Q-Network, a recent deep reinforcement
learning algorithm that is at once scalable and model-free. We compare our
scheduling algorithm to popular scheduling algorithms such as round-robin and
reduced-waiting-time, among others. Our algorithm is shown to significantly
outperform these algorithms for many example scenarios
A Learning Theoretic Approach to Energy Harvesting Communication System Optimization
A point-to-point wireless communication system in which the transmitter is
equipped with an energy harvesting device and a rechargeable battery, is
studied. Both the energy and the data arrivals at the transmitter are modeled
as Markov processes. Delay-limited communication is considered assuming that
the underlying channel is block fading with memory, and the instantaneous
channel state information is available at both the transmitter and the
receiver. The expected total transmitted data during the transmitter's
activation time is maximized under three different sets of assumptions
regarding the information available at the transmitter about the underlying
stochastic processes. A learning theoretic approach is introduced, which does
not assume any a priori information on the Markov processes governing the
communication system. In addition, online and offline optimization problems are
studied for the same setting. Full statistical knowledge and causal information
on the realizations of the underlying stochastic processes are assumed in the
online optimization problem, while the offline optimization problem assumes
non-causal knowledge of the realizations in advance. Comparing the optimal
solutions in all three frameworks, the performance loss due to the lack of the
transmitter's information regarding the behaviors of the underlying Markov
processes is quantified
Two Timescale Convergent Q-learning for Sleep--Scheduling in Wireless Sensor Networks
In this paper, we consider an intrusion detection application for Wireless
Sensor Networks (WSNs). We study the problem of scheduling the sleep times of
the individual sensors to maximize the network lifetime while keeping the
tracking error to a minimum. We formulate this problem as a
partially-observable Markov decision process (POMDP) with continuous
state-action spaces, in a manner similar to (Fuemmeler and Veeravalli [2008]).
However, unlike their formulation, we consider infinite horizon discounted and
average cost objectives as performance criteria. For each criterion, we propose
a convergent on-policy Q-learning algorithm that operates on two timescales,
while employing function approximation to handle the curse of dimensionality
associated with the underlying POMDP. Our proposed algorithm incorporates a
policy gradient update using a one-simulation simultaneous perturbation
stochastic approximation (SPSA) estimate on the faster timescale, while the
Q-value parameter (arising from a linear function approximation for the
Q-values) is updated in an on-policy temporal difference (TD) algorithm-like
fashion on the slower timescale. The feature selection scheme employed in each
of our algorithms manages the energy and tracking components in a manner that
assists the search for the optimal sleep-scheduling policy. For the sake of
comparison, in both discounted and average settings, we also develop a function
approximation analogue of the Q-learning algorithm. This algorithm, unlike the
two-timescale variant, does not possess theoretical convergence guarantees.
Finally, we also adapt our algorithms to include a stochastic iterative
estimation scheme for the intruder's mobility model. Our simulation results on
a 2-dimensional network setting suggest that our algorithms result in better
tracking accuracy at the cost of only a few additional sensors, in comparison
to a recent prior work
- …