7,907 research outputs found
Markov Decision Processes with Applications in Wireless Sensor Networks: A Survey
Wireless sensor networks (WSNs) consist of autonomous and resource-limited
devices. The devices cooperate to monitor one or more physical phenomena within
an area of interest. WSNs operate as stochastic systems because of randomness
in the monitored environments. For long service time and low maintenance cost,
WSNs require adaptive and robust methods to address data exchange, topology
formulation, resource and power optimization, sensing coverage and object
detection, and security challenges. In these problems, sensor nodes are to make
optimized decisions from a set of accessible strategies to achieve design
goals. This survey reviews numerous applications of the Markov decision process
(MDP) framework, a powerful decision-making tool to develop adaptive algorithms
and protocols for WSNs. Furthermore, various solution methods are discussed and
compared to serve as a guide for using MDPs in WSNs
Hierarchical Deep Reinforcement Learning for Age-of-Information Minimization in IRS-aided and Wireless-powered Wireless Networks
In this paper, we focus on a wireless-powered sensor network coordinated by a
multi-antenna access point (AP). Each node can generate sensing information and
report the latest information to the AP using the energy harvested from the
AP's signal beamforming. We aim to minimize the average age-of-information
(AoI) by adapting the nodes' transmission scheduling and the transmission
control strategies jointly. To reduce the transmission delay, an intelligent
reflecting surface (IRS) is used to enhance the channel conditions by
controlling the AP's beamforming vector and the IRS's phase shifting matrix.
Considering dynamic data arrivals at different sensing nodes, we propose a
hierarchical deep reinforcement learning (DRL) framework to for AoI
minimization in two steps. The users' transmission scheduling is firstly
determined by the outer-loop DRL approach, e.g. the DQN or PPO algorithm, and
then the inner-loop optimization is used to adapt either the uplink information
transmission or downlink energy transfer to all nodes. A simple and efficient
approximation is also proposed to reduce the inner-loop rum time overhead.
Numerical results verify that the hierarchical learning framework outperforms
typical baselines in terms of the average AoI and proportional fairness among
different nodes.Comment: 31 pages, 6 figures, 2 tables, 3 algorithm
Joint Channel Selection and Power Control in Infrastructureless Wireless Networks: A Multi-Player Multi-Armed Bandit Framework
This paper deals with the problem of efficient resource allocation in dynamic
infrastructureless wireless networks. Assuming a reactive interference-limited
scenario, each transmitter is allowed to select one frequency channel (from a
common pool) together with a power level at each transmission trial; hence, for
all transmitters, not only the fading gain, but also the number of interfering
transmissions and their transmit powers are varying over time. Due to the
absence of a central controller and time-varying network characteristics, it is
highly inefficient for transmitters to acquire global channel and network
knowledge. Therefore a reasonable assumption is that transmitters have no
knowledge of fading gains, interference, and network topology. Each
transmitting node selfishly aims at maximizing its average reward (or
minimizing its average cost), which is a function of the action of that
specific transmitter as well as those of all other transmitters. This scenario
is modeled as a multi-player multi-armed adversarial bandit game, in which
multiple players receive an a priori unknown reward with an arbitrarily
time-varying distribution by sequentially pulling an arm, selected from a known
and finite set of arms. Since players do not know the arm with the highest
average reward in advance, they attempt to minimize their so-called regret,
determined by the set of players' actions, while attempting to achieve
equilibrium in some sense. To this end, we design in this paper two joint power
level and channel selection strategies. We prove that the gap between the
average reward achieved by our approaches and that based on the best fixed
strategy converges to zero asymptotically. Moreover, the empirical joint
frequencies of the game converge to the set of correlated equilibria. We
further characterize this set for two special cases of our designed game
- …